首页> 外文会议>International Conference on Dependable, Autonomic and Secure Computing >Crawling Strategy of Focused Crawler Based on Niche Genetic Algorithm
【24h】

Crawling Strategy of Focused Crawler Based on Niche Genetic Algorithm

机译:基于利基遗传算法的聚焦履带爬行策略

获取原文
获取外文期刊封面目录资料

摘要

In order to improve the search efficiency of focused crawler, we design a new crawling strategy based on the niche genetic algorithm. Rather than colleting and indexing all accessible hypertext documents to be able to answer all possible ad-hoc queries, the new crawling strategy, combined the advantages of hyperlinks structure and web content strategies, uses hyperlink as genetic individual and topic-keywords based VSM is used to evaluate individual fitness, and imports new URLs to implement crossover and mutation, and the URLs that have the same prefix are regarded as niche. Guide the crawl direction by niche genetic algorithm to selectively seek out pages that are likely to be most relevant to a pre-defined set of topics. Compared with the other algorithms, experiments show that the strategy has higher precision and recall in searching the topic pages.
机译:为了提高重点履带的搜索效率,我们设计了基于利基遗传算法的新爬行策略。而不是拼写和索引所有可访问的超文本文档,以便能够回答所有可能的ad-hoc查询,新的爬行策略,组合超链接结构和Web内容策略的优势,使用超链接作为基于遗传个体和主题 - 基于关键字的VSM为了评估个人健身,并进口新的URL来实现交叉和突变,以及将相同前缀的URL被视为利基。通过利基遗传算法引导爬网方向,选择性地寻找可能与预定义主题组最相关的页面。与其他算法相比,实验表明,该策略在搜索主题页面时具有更高的精度和召回。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号