首页> 外文期刊>Web Intelligence and Agent Systems >FICA: A novel intelligent crawling algorithm based on reinforcement learning
【24h】

FICA: A novel intelligent crawling algorithm based on reinforcement learning

机译:FICA:一种基于强化学习的新型智能爬网算法

获取原文
获取原文并翻译 | 示例
           

摘要

The web is a huge and highly dynamic environment which is growing exponentially in content and developing fast in structure No search engine can cover the whole web, thus it has to focus on the most valuable pages for crawling. So an efficient crawling algorithm for retrieving the most important pages remains a challenging issue. Several algorithms like Pag-eRank and OPIC have been proposed. Unfortunately, they have high time complexity and low throughput. In this paper, an intelligent crawling algorithm based on reinforcement learning, called FICA is proposed that models a random surfing user. The priority for crawling pages is based on a concept we call logarithmic distance. FICA is easy to implement and its time complexity is O(E~*logV) where V and E are the number of nodes and edges in the web graph respectively. Comparison of FICA with other proposed algorithms shows that FICA outperforms them in discovering highly important pages. Furthermore, FICA computes the importance (ranking) of each page during the crawling process. Thus, we can also use FICA as a ranking method for computation of page importance. A nice property of FICA is its adaptability to the web in that it adjusts dynamically with changes in the web graph. We have used UK's web graph to evaluate our approach.
机译:网络是一个巨大且高度动态的环境,其内容呈指数级增长并且结构快速发展,没有搜索引擎可以覆盖整个网络,因此它必须专注于最有价值的网页进行爬网。因此,用于检索最重要页面的有效爬网算法仍然是一个具有挑战性的问题。已经提出了几种算法,例如Pag-eRank和OPIC。不幸的是,它们具有高时间复杂度和低吞吐量。在本文中,提出了一种基于强化学习的智能爬网算法FICA,该算法对随机冲浪用户进行建模。抓取页面的优先级基于我们称为对数距离的概念。 FICA易于实现,其时间复杂度为O(E〜* logV),其中V和E分别是网络图中的节点数和边数。 FICA与其他提出的算法的比较表明,在发现非常重要的页面时,FICA的性能优于它们。此外,FICA在爬网过程中计算每个页面的重要性(排名)。因此,我们还可以将FICA用作页面重要性计算的排名方法。 FICA的一个不错的特性是它对Web的适应性,因为它可以随着Web图形的变化而动态地进行调整。我们已经使用英国的网络图表来评估我们的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号