首页> 外文会议>E-Business and Information System Security, 2009. EBISS '09 >A Novel Approach To Automatically Extracting Main Content of Web News
【24h】

A Novel Approach To Automatically Extracting Main Content of Web News

机译:一种自动提取网络新闻主要内容的新方法

获取原文

摘要

Recently, the Web has been the data repository. In order to obtain the relevant information from the repository, many research have been made. The typical function of Web news extraction is to locate the useful content text and filter the noises , both main issues result in Web news extraction that is an open research problem. In this paper , we describe an approach that can cluster the pages which share common extracting path and automatically extract location of main text passages. Our approach can apply to structural Web pages . Moreover, we developed an extracting system by using our algorithm. Experiments are done over several important on-line news sites and experimental results on our extracting system show that the approach can achieve higher extraction accuracy than RTDM algorithm.
机译:最近,Web已经成为数据存储库。为了从存储库获得相关信息,已经进行了许多研究。 Web新闻提取的典型功能是找到有用的内容文本并过滤噪声,这两个主要问题导致Web新闻提取成为一个开放的研究问题。在本文中,我们描述了一种方法,该方法可以对共享公共提取路径的页面进行聚类,并自动提取主要文本段落的位置。我们的方法可以应用于结构化网页。此外,我们通过使用我们的算法开发了一个提取系统。在几个重要的在线新闻站点上进行了实验,并且在我们的提取系统上的实验结果表明,该方法比RTDM算法可以获得更高的提取精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号