首页> 外文会议>Dependable, Autonomic and Secure Computing, 2009. DASC '09 >Crawling Strategy of Focused Crawler Based on Niche Genetic Algorithm
【24h】

Crawling Strategy of Focused Crawler Based on Niche Genetic Algorithm

机译:基于小生境遗传算法的集中履带爬行策略

获取原文

摘要

In order to improve the search efficiency of focused crawler, we design a new crawling strategy based on the niche genetic algorithm. Rather than colleting and indexing all accessible hypertext documents to be able to answer all possible ad-hoc queries, the new crawling strategy, combined the advantages of hyperlinks structure and web content strategies, uses hyperlink as genetic individual and topic-keywords based VSM is used to evaluate individual fitness, and imports new URLs to implement crossover and mutation, and the URLs that have the same prefix are regarded as niche. Guide the crawl direction by niche genetic algorithm to selectively seek out pages that are likely to be most relevant to a pre-defined set of topics. Compared with the other algorithms, experiments show that the strategy has higher precision and recall in searching the topic pages.
机译:为了提高聚焦爬虫的搜索效率,我们设计了一种基于小生境遗传算法的爬虫策略。新的抓取策略不是收集和索引所有可访问的超文本文档以能够回答所有可能的即席查询,而是结合了超链接结构和Web内容策略的优点,将超链接用作遗传个体,并使用基于VSM的主题关键字评估个人适应性,并导入新的URL以实现交叉和变异,并且具有相同前缀的URL被视为利基。通过利基遗传算法引导爬行方向,以有选择地找出与预定主题最相关的页面。与其他算法相比,实验表明该策略在搜索主题页面时具有更高的精度和召回率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号