首页> 外文会议>12th Asia Pacific Web Conference (APWeb 2010) >ECON: An Approach to Extract Content from Web News Page
【24h】

ECON: An Approach to Extract Content from Web News Page

机译:ECON:一种从Web新闻页面提取内容的方法

获取原文
获取原文并翻译 | 示例

摘要

This paper provides a simple but effective approach, named ECON, to fully-automatically extract content from Web news page. ECON uses a DOM tree to represent the Web news page and leverages the substantial features of the DOM tree. ECON finds a snippet-node by which a part of the content of news is wrapped firstly, then backtracks from the snippet-node until a summary-node is found, and the entire content of news is wrapped by the summary-node. During the process of backtracking, ECON removes noise. Experimental results showed that ECON can achieve high accuracy and fully satisfy the requirements for scalable extraction. Moreover, ECON can be applied to Web news page written in many popular languages such as Chinese, English, French, German, Italian, Japanese, Portuguese, Russian, Spanish, Arabic. ECON can be implemented much easily.
机译:本文提供了一种简单而有效的方法,称为ECON,可以从Web新闻页面中自动提取内容。 ECON使用DOM树表示Web新闻页面,并利用DOM树的实质功能。 ECON找到一个片段节点,首先将其包裹新闻内容的一部分,然后从该片段节点回溯,直到找到一个摘要节点,然后将整个新闻内容都由摘要节点包裹。在回溯过程中,ECON会消除噪音。实验结果表明,ECON可以达到较高的精度,完全满足可扩展提取的要求。此外,ECON可以应用于以许多流行语言(例如中文,英语,法语,德语,意大利语,日语,葡萄牙语,俄语,西班牙语,阿拉伯语)编写的Web新闻页面。 ECON可以轻松实施。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号