首页> 外文会议>International Workshop on Computer Science and Engineering >The Implementation of a Web Crawler URL Filter Algorithm Based on Caching
【24h】

The Implementation of a Web Crawler URL Filter Algorithm Based on Caching

机译:基于缓存的Web爬网屏幕滤波器算法的实现

获取原文

摘要

For large-scale Web information collection, the URL filter module plays important roles in a Web crawler which is a central component of a search engine. The performance of an URL filter module influents the efficiency of the entire collection system directly. This paper introduces one URL filter algorithm based on caching and its implementation. The performances of stability and paralleling of the algorithm are verified by the experiments for Websites which handle a large number of web pages. Experiment results show the algorithm proposed in this paper can achieve satisfactory performances through reasonable adjustments of its some parameters and it is suitable for the process of the URL filter of a Website which has a number of page navigator links and index pages especially.
机译:对于大型Web信息集合,URL过滤器模块在Web爬网程序中播放重要角色,该角色是搜索引擎的一个中央组件。 URL过滤器模块的性能直接影响整个收集系统的效率。本文介绍了一种基于缓存的URL滤波器算法及其实现。通过处理大量网页的网站的实验验证了算法的稳定性和并行的性能。实验结果表明,本文提出的算法可以通过其一些参数的合理调整来实现令人满意的性能,并且适用于具有多个页面导航器链接和索引页面的网站的URL过滤器的过程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号