...
首页> 外文期刊>International Journal of Knowledge-Based in Intelligent Engineering Systems >Development of an intelligent distributed news retrieval system
【24h】

Development of an intelligent distributed news retrieval system

机译:智能分布式新闻检索系统的开发

获取原文
获取原文并翻译 | 示例
           

摘要

Currently available web news retrieval systems face a number of problems in that web-based news retrieval requires the ability to quickly and accurately process and update a very large amount of data which are constantly being updated. In this paper, we present the development of an intelligent distributed web news retrieval system the goal of which is to accurately retrieve and organize the web news information. It includes: a novel optimized crawler algorithm whose fetching-speed is several times faster than that of the traditional crawler; a keen tag based extraction algorithm which can extract the data rich content with minimal manual effort and which also allows data to be classified as important or not important so that the crawler can revisit and update important data; a modified MapReduce improved by estimating the execution time of each subtask, which is proven to be able to reduce the number of the unusual tasks and shorten the whole job execution time.
机译:当前可用的网络新闻检索系统面临许多问题,因为基于网络的新闻检索需要具有快速准确地处理和更新不断更新的大量数据的能力。在本文中,我们提出了一种智能分布式网络新闻检索系统的开发,该系统的目标是准确地检索和组织网络新闻信息。它包括:一种新颖的优化爬虫算法,其提取速度是传统爬虫的几倍;基于敏锐标签的提取算法,可以以最少的人工提取丰富的数据内容,还可以将数据分类为重要或不重要,以便爬虫可以重新访问和更新重要数据;通过估计每个子任务的执行时间对改进的MapReduce进行了改进,事实证明,该方法可以减少异常任务的数量并缩短整个作业的执行时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号