A Novel Approach To Automatically Extracting Main Content of Web News

机译：一种自动提取网络新闻主要内容的新方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, the Web has been the data repository. In order to obtain the relevant information from the repository, many research have been made. The typical function of Web news extraction is to locate the useful content text and filter the noises , both main issues result in Web news extraction that is an open research problem. In this paper , we describe an approach that can cluster the pages which share common extracting path and automatically extract location of main text passages. Our approach can apply to structural Web pages . Moreover, we developed an extracting system by using our algorithm. Experiments are done over several important on-line news sites and experimental results on our extracting system show that the approach can achieve higher extraction accuracy than RTDM algorithm.

机译：最近，Web已经成为数据存储库。为了从存储库获得相关信息，已经进行了许多研究。 Web新闻提取的典型功能是找到有用的内容文本并过滤噪声，这两个主要问题导致Web新闻提取成为一个开放的研究问题。在本文中，我们描述了一种方法，该方法可以对共享公共提取路径的页面进行聚类，并自动提取主要文本段落的位置。我们的方法可以应用于结构化网页。此外，我们通过使用我们的算法开发了一个提取系统。在几个重要的在线新闻站点上进行了实验，并且在我们的提取系统上的实验结果表明，该方法比RTDM算法可以获得更高的提取精度。

著录项

来源
《E-Business and Information System Security, 2009. EBISS '09》|2009年|1-4|共4页
会议地点 Wuhan(CN);Wuhan(CN)
作者
Xuan Wang; Weiping Wang; Bowen Liu; Zhen Wang; Xicai Wang;
展开▼
作者单位

Bus. Intell. Lab., Univ. of Sci. Technol. of China, Hefei;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Web sites; content management; information filtering; pattern clustering; text analysis; Web site; automatic Web new content extraction; content text; data repository; noises filtering; structural Web page clustering;

机译：网站;内容管理;信息过滤;模式聚类;文本分析;网站; Web新内容自动提取;内容文本;数据存储库;噪声过滤;结构化网页聚类;

相似文献

外文文献
中文文献
专利

1. Content annotation for the semantic web: an automatic web-based approach [J] . David Sanchez, David Isern, Miquel Millan Knowledge and information systems . 2011,第3期

机译：语义Web的内容注释：一种基于Web的自动方法
2. Content annotation for the semantic web: an automatic web-based approach [J] . David Sánchez, David Isern, Miquel Millan Knowledge and Information Systems . 2011,第3期

机译：语义Web的内容注释：一种基于Web的自动方法
3. A Pure Visual Approach for Automatically Extracting and Aligning Structured Web Data [J] . Estuka Fadwa, Miller James ACM Transactions on Internet Technology . 2019,第4期

机译：一种自动提取和对齐结构化Web数据的纯粹视觉方法
4. A Novel Approach To Automatically Extracting Main Content of Web News [C] . Xuan WANG, WeiPing WANG, Bowen LIU, International Conference on E-Business and Information System Security . 2009

机译：一种自动提取Web新闻主要内容的新方法
5. Unbalanced frame: A content analysis of four main American newspapers and their coverage of Colombia and the drug issue. [D] . Velez Galvis, Olga Elena. 2005

机译：框架不平衡：对四家主要美国报纸的内容分析及其对哥伦比亚和毒品问题的报道。
6. Automatic Detection of Pornographic and Gambling Websites Based on Visual and Textual Content Using a Decision Mechanism [O] . Yang Chen, Rongfeng Zheng, Anmin Zhou, 2020

机译：基于使用决策机制的视觉和文本内容自动检测色情和赌博网站
7. A Study on Extracting News Contents from News Web Pages [O] . Yong-Gu Lee 2009

机译：从新闻网页提取新闻内容的研究

A Novel Approach To Automatically Extracting Main Content of Web News

摘要

著录项

相似文献

相关主题

期刊订阅