首页> 外文会议>Latin American Web Conference >Improving the Efficiency of a Genre-Aware Approach to Focused Crawling Based on Link Context
【24h】

Improving the Efficiency of a Genre-Aware Approach to Focused Crawling Based on Link Context

机译:基于链接上下文提高基于链接爬网的类型感知方法的效率

获取原文

摘要

Focused crawlers attempt to crawl web pages that are relevant to a specific topic or user interest. Although these kinds of crawlers have been proven to be effective, they need to improve their efficiency. Focused crawlers usually use a Frontier of non-visited URLs to visit the web pages and gather relavant ones. In this work, we define and evaluate a queueing policy of non-visited URLs, based on link context, to improve the efficiency of a genre-aware focused crawler. Our experimental evaluation shows, in some situations, an improvement around 100% in efficiency terms.
机译:聚焦爬虫尝试爬网网页,该网页与特定主题或用户兴趣相关。 虽然已经证明了这些类型的爬行者有效,但他们需要提高他们的效率。 聚焦爬虫通常使用未访问的URL的前沿来访问网页并收集脱发。 在这项工作中,我们根据链接上下文定义和评估未访问的URL的排队策略,以提高流派感知聚焦履带的效率。 我们的实验评估显示在某些情况下,在效率方面的提高约100%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号