An XML based Web Crawler with Page Revisit Policy and Updation in Local Repository of Search Engine

Jyoti Mor; Dr Dinesh Rai; Dr Naresh Kumar

首页> 外文期刊>International Journal of Engineering & Technology >An XML based Web Crawler with Page Revisit Policy and Updation in Local Repository of Search Engine

【24h】

An XML based Web Crawler with Page Revisit Policy and Updation in Local Repository of Search Engine

机译：基于XML的Web爬网程序，在搜索引擎的本地存储库中具有页面重新访问策略和更新

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In a large collection of web pages, it is difficult for search engines to keep their online repository updated. Major search engines have hundreds of web crawlers that crawl the WWW day and night and send the downloaded web pages via a network to be stored in the search engine’s database. These results in over utilization of network resources like bandwidth, CPU cycles and so on. This paper proposes an architecture that tries to reduce the utilization of shared network resources with the help of an advanced XML based approach. This focused crawling based architecture is trained to download only the high quality data from the internet leaving behind the web pages which are not relevant to the desired domain. Here, a detailed layout of the proposed system is described which is capable of reducing the load on network and reducing the problem arise in residency of mobile agent at the remote server.

机译：在大量的网页中，搜索引擎很难保持其在线存储库的更新。大型搜索引擎有数百个Web爬虫，它们会日夜爬行WWW并通过网络发送下载的网页，并将其存储在搜索引擎的数据库中。这些导致过度利用网络资源，例如带宽，CPU周期等。本文提出了一种架构，该架构试图借助基于XML的高级方法来减少共享网络资源的利用率。这种基于爬网的集中式体系结构经过培训，可以从互联网上仅下载高质量数据，而留下与所需域无关的网页。这里，描述了所提出的系统的详细布局，其能够减少网络上的负载并减少移动代理在远程服务器上的驻留问题。

著录项

来源
《International Journal of Engineering & Technology》 |2018年第3期|共5页
作者
Jyoti Mor; Dr Dinesh Rai; Dr Naresh Kumar;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类工业技术;
关键词
WWWSearch EngineWeb CrawlerNetwork ResourcesPage Revisit.;

机译：WWWSearch EngineWeb爬网程序网络资源页面。;

相似文献

外文文献
中文文献
专利

1. Design of a Least Cost (LC) Vertical Search Engine based on Domain Specific Hidden Web Crawler [J] . Sudhakar Ranjan, Komal Kumar Bhatia International journal of information retrieval research . 2017,第2期

机译：基于特定于域的隐藏Web爬虫的最低成本（LC）垂直搜索引擎的设计
2. Applying Social Network Analysis Techniques in Crawler Based Search Engine to Support Web Terrorism Mining [J] . Amin Shahraki Moghaddam, Javad Hosseinkhani, Suriayati Chuprat, International journal of computer science and network security . 2017,第8期

机译：在基于爬虫的搜索引擎中应用社交网络分析技术以支持Web恐怖主义挖掘
3. Search Engine or Content Website? A Local Information Seeking Classification Model Based on Consumer Characteristics and Website Perceptions [J] . Hsu Li-ling, Walter Zhiping International journal of human-computer interaction . 2015,第4a6期

机译：搜索引擎还是内容网站？基于消费者特征和网站感知的本地信息搜索分类模型
4. Designing clustering-based web crawling policies for search engine crawlers [C] . Qingzhao Tan, Prasenjit Mitra, C. Lee Giles, Proceedings of the Sixteenth ACM conference on Conference on information and knowledge management . 2007

机译：为搜索引擎爬网程序设计基于集群的Web爬网策略
5. Providing content by Web -based delivery methods: Using digital video, instructor -selected Websites, and search engines, to deliver information about the principles of behaviorism. [D] . Quinn, Andrew Stewart. 2004

机译：通过基于Web的传递方法提供内容：使用数字视频，讲师选择的网站和搜索引擎来传递有关行为主义原理的信息。
6. L1000 Viewer: A Search Engine and Web Interface for the LINCS Data Repository [O] . Aliyu Musa, Shailesh Tripathi, Matthias Dehmer, -1

机译：L1000 Viewer：LINCS数据存储库的搜索引擎和Web界面
7. Crowdsourcing based local Web search engine [O] . Laukkanen Jesse 2015

机译：基于众包的本地Web搜索引擎

An XML based Web Crawler with Page Revisit Policy and Updation in Local Repository of Search Engine

摘要

著录项

相似文献

相关主题

期刊订阅