首页> 外文会议>IEEE 2nd Symposium on Web Society >HisTrace: A system for mining on news-related articles instead of web pages

【24h】

HisTrace: A system for mining on news-related articles instead of web pages

机译：HisTrace：一种用于挖掘新闻相关文章而非网页的系统

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The Web is now playing an important part in people's real-life activities. Scientists of not only computer science but also sociology and economics might be interested in mining on information directly related to real-life events, or news-related information on the Web. In this paper we propose a system to enable mining on news-related articles instead of raw web pages. There are functionally two tasks in our system: 1) mining for news-related articles and 2) duplicate elimination. For the first task, a novel approach for determining titles, contents and publication-times of news-related articles is presented. Anchor texts are firstly used to extract titles from HTML bodies and then contents are extracted right after titles. After that, crawl-times and are used to initially compute publication-times for all articles. At last, times extracted from HTML bodies, URLs and anchor texts are used to determine precise publication-times for possible articles. For the second task, a duplicate detection algorithm for news-related articles is described which is base on LCS (longest common subsequence) and achieves both high precision and high recall. The framework of this algorithm has been presented as a general-purpose algorithm for web pages in a previously published paper. In this paper we explain why this algorithm is particularly suitable for news-related articles and present corresponding implementation details. Evaluations have been conducted which show the effectiveness of our approaches.

机译：None

著录项

来源
《IEEE 2nd Symposium on Web Society》|2010年|p.30-37|共8页
会议地点
作者
Huang Lianen; Li Xiaoming;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词
Duplicate Detection; News-related Articles; Publication Time; Web Mining;

机译：重复检测;新闻相关文章;出版时间; Web挖掘;

相似文献

外文文献
中文文献
专利

1. An Intelligent System for Predicting a User Access to a Web Based E-Learning System Using Web Mining [J] . International journal of information technology and web engineering . 2020,第1期

机译：使用Web Mining预测用户对基于Web的电子学习系统的访问的智能系统
2. Applying Web usage mining for personalizing hyperlinks in Web-based adaptive educational systems [J] . Cristobal Romero, Sebastian Ventura, Amelia Zafra, Computers & education . 2009,第3期

机译：在基于Web的自适应教育系统中应用Web用法挖掘来个性化超链接
3. Mining Inter-Relationships in Online Scientific Articles and its Visualization: Natural Language Processing for Systems Biology Modeling [J] . Nidheesh Melethadathil, Jaap Heringa, Bipin Nair, International journal of online engineering . 2019,第02期

机译：在线科学文章中的相互关系挖掘及其可视化：系统生物学建模的自然语言处理
4. HisTrace: A system for mining on news-related articles instead of web pages [C] . Huang Lianen, Li Xiaoming IEEE Symposium on Web Society . 2010

机译：Histrace：用于新闻相关文章而不是网页的挖掘系统
5. Design and Development of Intelligent Web Mining System for Extraction of Information from Web Databases [D] . Sharma, Sanjeev Kumar. 2010

机译：Web数据库提取信息的智能网络挖掘系统的设计与开发
6. Soil Food Web Changes during Spontaneous Succession at Post Mining Sites: A Possible Ecosystem Engineering Effect on Food Web Organization? [O] . Jan Frouz, Elisa Thébault, Václav Pižl, -1

机译：采矿现场自发演替过程中土壤食物网的变化：生态系统对食物网组织的影响？
7. Web mining for the integration of data mining with business intelligence in web-based decision support systems [O] . Domingues, Marcos Aurelio, Jorge, Alípio Mário, Soares, Carlos, 2015

机译：用于在基于Web的决策支持系统中将数据挖掘与商业智能集成的Web挖掘

HisTrace: A system for mining on news-related articles instead of web pages

摘要

著录项

相似文献

相关主题

期刊订阅