Learning to Extract Web News Title in Template Independent Way

机译：学习以模板独立方式提取Web新闻标题

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many news sites have large collections of news pages generated dynamically and endlessly from underlying databases. Automatic extraction of news titles and contents from news pages therefore is an important technique for applications like news aggregation systems. However, extracting news titles accurately from news pages of various styles is found to be a challenging task in previous work. In this paper, we propose a machine learning approach to tackle this problem. Our approach is independent of templates and thus will not suffer from the updates of templates which usually invalidate the corresponding extractors. Empirical evaluation of our approach over 5,200 news Web pages collected from 13 important on-line news sites shows that our approach significantly improves the accuracy of news title extraction.

机译：许多新闻站点都有大量的新闻页面，这些新闻页面是从基础数据库动态不断地生成的。因此，从新闻页面自动提取新闻标题和内容是新闻聚合系统等应用程序的一项重要技术。但是，在以前的工作中，从各种样式的新闻页面中准确地提取新闻标题是一项艰巨的任务。在本文中，我们提出了一种机器学习方法来解决这个问题。我们的方法独立于模板，因此不会受到模板更新（通常会使相应提取程序无效）的困扰。对从13个重要的在线新闻站点收集的超过5,200个新闻网页的方法进行的经验评估表明，我们的方法显着提高了新闻标题提取的准确性。

著录项

来源
《Rough sets and knowledge technology》|2009年|192-199|共8页
会议地点 Gold Coast(AU);Gold Coast(AU)
作者
Can Wang; Junfeng Wang; Chun Chen; Li Lin; Ziyu Guan; Junyan Zhu; Cheng Zhang; Jiajun Bu;
展开▼
作者单位

College of Computer Science, Zhejiang University, China;

College of Computer Science, Zhejiang University, China;

College of Computer Science, Zhejiang University, China;

College of Computer Science, Zhejiang University, China;

College of Computer Science, Zhejiang University, China;

College of Computer Science, Zhejiang University, China;

China Disabled Persons' Federation Information Center;

College of Computer Science, Zhejiang University, China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类程序设计、软件工程;
关键词
data extraction; web mining; web news;

机译：数据提取；网络挖掘；网络新闻;

相似文献

外文文献
中文文献
专利

1. What Web Template Extractor Should I Use? A Benchmarking and Comparison for Five Template Extractors [J] . Alarte Julian, Silva Josep, Tamarit Salvador ACM transactions on the web . 2019,第2期

机译：我应该使用哪种Web模板提取器？五个模板提取器的基准测试和比较
2. What Web Template Extractor Should I Use? A Benchmarking and Comparison for Five Template Extractors [J] . Alarte Julian, Silva Josep, Tamarit Salvador ACM transactions on the web . 2019,第2期

机译：我应该使用什么Web模板提取器？五个模板提取器的基准和比较
3. Learning page-independent heuristics for extracting data from Web pages [J] . William W. Cohen, Wei Fan Computer Networks . 1999,第11a16期

机译：学习与页面无关的启发式方法，以从Web页面提取数据
4. Learning to Extract Web News Title in Template Independent Way [C] . Can Wang, Junfeng Wang, Chun Chen, International Conference on Rough Sets and Knowledge Technology . 2009

机译：学习以模板独立方式提取网络新闻标题
5. Learning from Web-based news: The role of interactivity and motivation. [D] . Tremayne, Mark Winslow. 2002

机译：从基于Web的新闻中学习：交互作用和动机的作用。
6. Rendu-Osler-Weber disease: a triple eponymous title lives on. [O] . D D Gibbs 1986

机译：Rendu-Osler-Weber病：同名三胞胎继续存在。
7. What Web Template Extractor Should I Use? A Benchmarking and Comparison for Five Template Extractors [O] . Julián Alarte, Josep Silva, Salvador Tamarit 2019

机译：我应该使用什么Web模板提取器？五个模板提取器的基准和比较
8. Learning to Extract Symbolic Knowledge from the World Wide Web [R] . Craven, M. , McCallum, A. , PiPasquo, D. , 1998

机译：学习从万维网中提取符号知识

Learning to Extract Web News Title in Template Independent Way

摘要

著录项

相似文献

相关主题

期刊订阅