首页> 外文会议>International Conference on Database Systems for Advanced Applications >A Hybrid Approach for Refreshing Web Page Repositories
【24h】

A Hybrid Approach for Refreshing Web Page Repositories

机译:一种刷新网页存储库的混合方法

获取原文

摘要

Web pages change frequently and thus crawlers have to download them often. Various policies have been proposed for refreshing local copies of web pages. In this paper, we introduce a new sampling method that excels over other change detection methods in experiment. Change Frequency (CF) is a method that predicts the change frequency of the pages and, in the long run, achieves an optimal efficiency in comparison with the sampling method. Here, we propose a new hybrid method that is a combination of our new sampling approach and CF and show how our hybrid method improves the efficiency of change detection.
机译:网页频繁变更,因此爬虫必须经常下载它们。已提出各种策略来刷新网页的本地副本。在本文中,我们介绍了一种新的采样方法,可以在实验中脱离其他变化检测方法。更改频率(CF)是一种方法,其预测页面的变化频率,并且在长期运行中,与采样方法相比,实现了最佳效率。在这里,我们提出了一种新的混合方法,是我们的新采样方法和CF的组合,并展示了我们的混合方法如何提高变化检测效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号