...
【24h】

Focused Web Crawler

机译:专注的网爬行者

获取原文
   

获取外文期刊封面封底 >>

       

摘要

With the rapid development and increase in global data on World Wide Web and with increased and rapid growth in web users across the globe, an acute need has arisen to improve and modify or design search algorithms that helps in effectively and efficiently searching the specific required data from the huge repository available. Various search engines use different web crawlers for obtaining search results efficiently. Some search engines use focused web crawler that collects different web pages that usually satisfy some specific property, by effectively prioritizing the crawler frontier and managing the exploration process for hyperlink. A focused web crawler analyzes its crawl boundary to locate the links that are likely to be most relevant for the crawl, and avoids irrelevant regions of the web. This leads to significant savings in hardware and network resources, and helps keep the crawl more up-to-date.The process of focused web crawler is to nurture a collection set of web documents that are focused on some topical subspaces. It identifies the next most important and relevant link to follow by relying on probabilistic models for effectively predicting the relevancy of the document. Researchers across have proposed various algorithms for improving efficiency of focused web crawler. We try to investigate various types of crawlers with their pros and cons.Major focus area is focused web crawler. Future directions for improving efficiency of focused web crawler have been discussed. This will provide a base reference for anyone who wishes in researching or using concept of focused webcrawler in their research work that he/she wishes to carry out. The performance of a focused webcrawler depends on the richness of links in the specific topic being searched by the user, and it usually relies on a general web search engine for providing starting points for searching.
机译:随着全球网络的全球数据的快速发展和增加,全球网络用户的增加和快速增长,因此出现了改进和修改或设计搜索算法的急性需求,有效,有效地搜索特定的所需数据从巨大的存储库中可用。各种搜索引擎使用不同的Web爬网程序有效地获取搜索结果。一些搜索引擎使用聚焦的Web爬虫,通过有效地优先考虑爬网前沿并管理超链接的探索过程,收集通常满足某些特定属性的不同网页。聚焦的Web爬网程序分析其爬行边界,以定位可能对爬行最相关的链接,并避免网络的无关区域。这导致硬件和网络资源的大量节省,并有助于保持爬网更新。专注的Web爬网程序的过程是培育一组专注于某些局部子空间的Web文档集。它通过依赖概率模型来依赖于有效预测文档的相关性来识别下一个最重要的和相关的链接。跨越的研究人员提出了各种算法,以提高聚焦卷筒纸的效率。我们试图调查各种类型的爬行者,他们的利益和缺点是焦点焦点面积的重点是Web履带。已经讨论了提高专注的Web履带效率的未来方向。这将为任何愿望研究或使用聚焦WebCrawler的概念在他们的研究工作中的概念提供的任何人都提供基准参考。聚焦WebCrawler的性能取决于用户搜索的特定主题中的链路的丰富性,并且通常依赖于一般的网络搜索引擎来提供用于搜索的起点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号