The Implementation of a Web Crawler URL Filter Algorithm Based on Caching

机译：基于缓存的Web爬网屏幕滤波器算法的实现

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

For large-scale Web information collection, the URL filter module plays important roles in a Web crawler which is a central component of a search engine. The performance of an URL filter module influents the efficiency of the entire collection system directly. This paper introduces one URL filter algorithm based on caching and its implementation. The performances of stability and paralleling of the algorithm are verified by the experiments for Websites which handle a large number of web pages. Experiment results show the algorithm proposed in this paper can achieve satisfactory performances through reasonable adjustments of its some parameters and it is suitable for the process of the URL filter of a Website which has a number of page navigator links and index pages especially.

机译：对于大型Web信息集合，URL过滤器模块在Web爬网程序中播放重要角色，该角色是搜索引擎的一个中央组件。 URL过滤器模块的性能直接影响整个收集系统的效率。本文介绍了一种基于缓存的URL滤波器算法及其实现。通过处理大量网页的网站的实验验证了算法的稳定性和并行的性能。实验结果表明，本文提出的算法可以通过其一些参数的合理调整来实现令人满意的性能，并且适用于具有多个页面导航器链接和索引页面的网站的URL过滤器的过程。

著录项

来源
《International Workshop on Computer Science and Engineering》|2009年||共4页
会议地点
作者
Wang Hui-chang; Ruan Shu-hua; Tans Qi-jie;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词
Web Crawler; URL Filter; Caching;

机译：网爬虫;url过滤器;缓存;

相似文献

外文文献
中文文献
专利

1. UCrawler: A learning-based web crawler using a URL knowledge base [J] . Wang Wei, Yu Lihua Journal of Computational Methods in Sciences and Engineering . 2021,第2期

机译：Ucrawler：使用URL知识库的基于学习的Web爬网
2. Cache-optimized implementation of the filtered backprojection algorithm on a digital signal processor [J] . Ricardo A. Neri-Calderon, Sergio Alcaraz-Corona, Ramon M. Rodriguez-Dagnino Journal of electronic imaging . 2007,第4期

机译：在数字信号处理器上对滤波后投影算法进行高速缓存优化实现
3. A Web-Based Multimedia Retrieval System with MCA-Based Filtering and Subspace-Based Learning Algorithms [J] . Chao Chen, Tao Meng, Lin Lin International journal of multimedia data engineering & management . 2013,第2期

机译：基于Web的多媒体检索系统，具有基于MCA的过滤和基于子空间的学习算法
4. The Implementation of a Web Crawler URL Filter Algorithm Based on Caching [C] . Wang Hui-chang, Ruan Shu-hua, Tans Qi-jie International Workshop on Computer Science and Engineering . 2009

机译：基于缓存的Web爬网屏幕滤波器算法的实现
5. Scalable Cooperative Caching Algorithm based on Bloom Filters [D] . Siddikov, Nodirjon. 2011

机译：基于布隆过滤器的可扩展协作缓存算法
6. A Noise Filtering Algorithm for Event-Based Asynchronous Change Detection Image Sensors on TrueNorth and Its Implementation on TrueNorth [O] . Vandana Padala, Arindam Basu, Garrick Orchard 2018

机译：TrueNorth基于事件的异步变化检测图像传感器的噪声过滤算法及其在TrueNorth上的实现
7. FilteredWeb: A Framework for the Automated Search-Based Discovery of Blocked URLs [O] . Darer, Alexander, Farnan, Oliver, Wright, Joss 2017

机译：FilteredWeb：基于自动搜索的发现框架被阻止的网址

The Implementation of a Web Crawler URL Filter Algorithm Based on Caching

摘要

著录项

相似文献

相关主题

期刊订阅