Learning to Extract Web News Title in Template Independent Way

机译：学习以模板独立方式提取网络新闻标题

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many news sites have large collections of news pages generated dynamically and endlessly from underlying databases. Automatic extraction of news titles and contents from news pages therefore is an important technique for applications like news aggregation systems. However, extracting news titles accurately from news pages of various styles is found to be a challenging task in previous work. In this paper, we propose a machine learning approach to tackle this problem. Our approach is independent of templates and thus will not suffer from the updates of templates which usually invalidate the corresponding extractors. Empirical evaluation of our approach over 5,200 news Web pages collected from 13 important on-line news sites shows that our approach significantly improves the accuracy of news title extraction.

机译：许多新闻网站有大量的新闻页面从底层数据库动态而无休止地生成。因此，新闻标题的自动提取新闻标题和内容是新闻聚合系统等应用的重要技术。但是，从各种样式的新闻页面中准确地提取新闻标题是在以前的工作中成为一个具有挑战性的任务。在本文中，我们提出了一种机器学习方法来解决这个问题。我们的方法与模板无关，因此不会遭受模板的更新，这些模板通常使相应的提取器无效。我们的方法的实证评估超过13个重要的在线新闻网站收集的5,200个新闻网页表明，我们的方法显着提高了新闻标题提取的准确性。

著录项

来源
《International Conference on Rough Sets and Knowledge Technology》|2009年||共8页
会议地点
作者
Can Wang; Junfeng Wang; Chun Chen; Li Lin; Ziyu Guan; Junyan Zhu; Cheng Zhang; Jiajun Bu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Data extraction; Web mining; Web news;

机译：数据提取;网挖掘;网页新闻;

相似文献

外文文献
中文文献
专利

1. What Web Template Extractor Should I Use? A Benchmarking and Comparison for Five Template Extractors [J] . Alarte Julian, Silva Josep, Tamarit Salvador ACM transactions on the web . 2019,第2期

机译：我应该使用哪种Web模板提取器？五个模板提取器的基准测试和比较
2. What Web Template Extractor Should I Use? A Benchmarking and Comparison for Five Template Extractors [J] . Alarte Julian, Silva Josep, Tamarit Salvador ACM transactions on the web . 2019,第2期

机译：我应该使用什么Web模板提取器？五个模板提取器的基准和比较
3. Learning page-independent heuristics for extracting data from Web pages [J] . William W. Cohen, Wei Fan Computer Networks . 1999,第11a16期

机译：学习与页面无关的启发式方法，以从Web页面提取数据
4. Learning to Extract Web News Title in Template Independent Way [C] . Can Wang, Junfeng Wang, Chun Chen, Rough sets and knowledge technology . 2009

机译：学习以模板独立方式提取Web新闻标题
5. Learning from Web-based news: The role of interactivity and motivation. [D] . Tremayne, Mark Winslow. 2002

机译：从基于Web的新闻中学习：交互作用和动机的作用。
6. Rendu-Osler-Weber disease: a triple eponymous title lives on. [O] . D D Gibbs 1986

机译：Rendu-Osler-Weber病：同名三胞胎继续存在。
7. What Web Template Extractor Should I Use? A Benchmarking and Comparison for Five Template Extractors [O] . Julián Alarte, Josep Silva, Salvador Tamarit 2019

机译：我应该使用什么Web模板提取器？五个模板提取器的基准和比较
8. Learning to Extract Symbolic Knowledge from the World Wide Web [R] . Craven, M. , McCallum, A. , PiPasquo, D. , 1998

机译：学习从万维网中提取符号知识

Learning to Extract Web News Title in Template Independent Way

摘要

著录项

相似文献

相关主题

期刊订阅