A Fast and Simple Method for Extracting Relevant Content from News Webpages

机译：一种快速简便的从新闻网页中提取相关内容的方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose NCE, an efficient algorithm to identify and extract relevant content from news webpages. We define relevant as the textual sections that more objectively describe the main event in the article. This includes the title and the main body section, and excludes comments about the story and presentation elements.Our experiments suggest that NCE is competitive, in terms of extraction quality, with the best methods available in the literature. It achieves F1 = 90.7% in our test corpus containing 324 news webpages from 22 sites. The main advantages of our method are its simplicity and its computational performance. It is at least an order of magnitude faster than methods that use visual features. This characteristic is very suitable for applications that process a large number of pages.

机译：我们提出了NCE，这是一种从新闻网页中识别和提取相关内容的有效算法。我们将相关部分定义为更客观地描述文章中主要事件的文本部分。这包括标题和主体部分，不包括有关故事和演示元素的评论。我们的实验表明，就提取质量而言，NCE与文献中提供的最佳方法相比具有竞争力。在我们的测试语料库中，它包含22个站点的324个新闻网页，其F1 = 90.7％。我们方法的主要优点是它的简单性和计算性能。它比使用视觉特征的方法至少快一个数量级。此特征非常适合处理大量页面的应用程序。

著录项

来源
《18th ACM conference on information and knowledge management 2009》|2009年|P.1685-1688|共4页
会议地点
作者
Eduardo Laber; Criston Souza; Iam Jabour; Evelin Amorim; Eduardo Cardoso; Raul Renteria; Lucio Tinoco; Caio Dias;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息处理（信息加工）;
关键词
algorithms; experiments;

机译：算法;实验;

相似文献

外文文献
中文文献
专利

1. Beer and Wine Testing - A Faster, Simpler Method for Testing the Free Amino Nitrogen (FAN) Content [J] . The Scientist . 2013,第5期

机译：啤酒和葡萄酒测试-一种测试游离氨基氮（FAN）含量的更快，更简单的方法
2. Development, validation, and application of a fast and simple GC-MS method for determination of some therapeutic drugs relevant in emergency toxicology. [J] . Meyer MR, Welter J, Weber AA, Therapeutic Drug Monitoring . 2011,第5期

机译：快速，简单的GC-MS方法的开发，验证和应用，用于确定与紧急毒理学有关的某些治疗药物。
3. Simple and fast ultrasound-assisted method for mineral content and bioaccessibility study in infant formula by ICP OES [J] . Analytical methods . 2020,第25期

机译：ICP OES在婴幼儿配方型矿物质含量和生物转移性研究简单快速快速的超声辅助方法
4. A fast and simple method for extracting relevant content from news webpages [C] . Eduardo Sany Laber, Criston Pereira de Souza, Iam Vita Jabour, 18th ACM conference on information and knowledge management 2009 . 2009

机译：一种快速简单的从新闻网页中提取相关内容的方法
5. The effect multimedia webpage design has on content transfer over a very fast network. [D] . Storslee, Jon H. 2001

机译：多媒体网页设计对通过非常快速的网络进行内容传输具有影响。
6. Fast and Simple Analytical Method for Direct Determination of Total Chlorine Content in Polyglycerol by ICP-MS [O] . Agata Jakóbik-Kolon, Andrzej Milewski, Piotr Dydo, 2018

机译：ICP-MS直接测定聚甘油中总氯含量的快速简单分析方法
7. Learning to Extract Content from News Webpages [O] . Alex Spengler, Patrick Gallinari 2015

机译：学习从新闻网页中提取内容
8. Simpler and Faster Algorithm for Optimal Total-Work-Content-Power Due Date Determination [R] . van de Velde, S. L. 1988

机译：最优总工作内容 - 功率到期日确定的简化算法

A Fast and Simple Method for Extracting Relevant Content from News Webpages

摘要

著录项

相似文献

相关主题

期刊订阅