首页> 外文会议>International symposium on string processing and information retrieval >Learning to Schedule Webpage Updates Using Genetic Programming
【24h】

Learning to Schedule Webpage Updates Using Genetic Programming

机译:学习使用遗传编程来计划网页更新

获取原文

摘要

A key challenge endured when designing a scheduling policy regarding freshness is to estimate the likelihood of a previously crawled webpage being modified on the web. This estimate is used to define the order in which those pages should be visited, and can be explored to reduce the cost of monitoring crawled webpages for keeping updated versions. We here present a novel approach to generate score functions that produce accurate rankings of pages regarding their probability of being modified when compared to their previously crawled versions. We propose a flexible framework that uses genetic programming to evolve score functions to estimate the likelihood that a webpage has been modified. We present a thorough experimental evaluation of the benefits of our framework over five state-of-the-art baselines.
机译:设计有关新鲜度的调度策略时所面临的主要挑战是估计先前已爬网的网页在Web上被修改的可能性。该估计值用于定义访问这些页面的顺序,并且可以进行探索以减少监视抓取的网页以保持更新版本的成本。在这里,我们提出了一种新颖的方法来生成评分函数,该评分函数与之前抓取的版本相比,可以针对页面被修改的可能性生成准确的页面排名。我们提出了一个灵活的框架,该框架使用遗传编程来发展评分功能,以估计网页被修改的可能性。我们针对五个最先进的基准对我们的框架的好处进行了全面的实验评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号