首页> 外文会议>ACM/IEEE on joint conference on digital libraries >A Very Efficient Approach to News Title and Content Extraction on the Web
【24h】

A Very Efficient Approach to News Title and Content Extraction on the Web

机译:对网上的新闻标题和内容提取的一种非常有效的方法

获取原文

摘要

We consider the problem of efficient and template-independent news extraction on the Web. The popular news extraction methods are based on visual information, and they can achieve good accuracy performance, but the computational efficiency is poor, because it is very time-consuming to render web page to obtain visual information. In this paper we propose an efficient and effective news extraction approach based on novel features. Our approach neither needs training nor needs visual information, so it is simple and very efficient. And it can extract news information from various news sites without using templates. In our experiments, the proposed approach achieves 99% accuracy over 5,671 news pages from 20 different news sites. And the efficiency is much faster than the baseline machine learning method using visual information.
机译:我们考虑了网上有效和独立于模板的新闻提取问题。流行的新闻提取方法基于视觉信息,它们可以实现良好的精度性能,但计算效率差,因为它非常耗时,以获得网页以获得可视信息。在本文中,我们提出了一种基于新功能的高效有效的新闻提取方法。我们的方法都不需要培训,也不需要视觉信息,因此它很简单,非常高效。它可以在不使用模板的情况下从各种新闻网站中提取新闻信息。在我们的实验中,拟议的方法从20个不同新闻网站的5,671个新闻页面实现了99%的准确性。并且效率比使用视觉信息的基线机学习方法更快。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号