Distributed Web Crawlers using Hadoop

Pratiba D.; Shobha G.; Lalith Kumar H.; Samrudh J.

首页> 外文期刊>International Journal of Applied Engineering Research >Distributed Web Crawlers using Hadoop

【24h】

Distributed Web Crawlers using Hadoop

机译：使用Hadoop分布式Web爬虫器

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Web Crawler is a software, which crawls through the WWW to build database for a search engine. In recent years, web crawling has started facing many challenges. Firstly, the web pages are highly unstructured which makes it difficult to maintain a generic schema for storage. Secondly, the WWW is too huge and it is impossible to index it as it is. Finally, the most difficult challenge is to crawl the deep web. Here we are proposing a novel web crawler, which uses Neo4J, HBase as data stores. It also applies Natural Language Processing (NLP) and machine learning techniques to resolve the above-mentioned problems.

机译：Web爬网程序是一种软件，它通过WWW爬行以构建搜索引擎的数据库。近年来，网络爬行已经开始面临许多挑战。首先，网页是高度非结构化的，这使得难以维持用于存储的通用模式。其次，WWW太大了，因为它是不可能索引它。最后，最困难的挑战是爬行深媒体。在这里，我们正在提出一种新颖的Web爬网，它使用Neo4j，HBase作为数据存储。它还应用自然语言处理（NLP）和机器学习技术来解决上述问题。

著录项

来源
《International Journal of Applied Engineering Research》 |2017年第8期|共9页
作者
Pratiba D.; Shobha G.; Lalith Kumar H.; Samrudh J.;
展开▼
作者单位

Department of Computer Science and Engineering R V College of Engineering R V Vidyanikethan Post;

Department of Computer Science and Engineering R V College of Engineering R V Vidyanikethan Post;

Department of Computer Science and Engineering R V College of Engineering R V Vidyanikethan Post;

Department of Information Science and Engineering R V College of Engineering R V Vidyanikethan Post;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类工程基础科学;
关键词
NoSQL; Neo4J; Hbase; Natural Language Processing; Reinforcement learning; Deep web;

机译：nosql;neo4j;hbase;自然语言处理;加固学习;深web;

相似文献

外文文献
中文文献
专利

1. Distributed Web Crawlers using Hadoop [J] . Pratiba D., Shobha G., Lalith Kumar H., International Journal of Applied Engineering Research . 2017,第24aPta8期

机译：使用Hadoop分布式Web爬虫器
2. An Ontology Based Crawler for Retrieving Information Distributed on the Web [J] . Wael A. Gab–Allah, Ben Bella S. Tawfik, Hamed M. Nassar International Journal of Engineering Research and Applications . 2016,第6期

机译：基于本体的爬虫，用于检索Web上分布的信息
3. Object Architected Design and Efficient Dynamic Adjustment Mechanism of Distributed Web Crawlers [J] . Cheng-Hung Tsai, Tsun Ku, Wu-Fan Chien International journal of interdisciplinary telecommunications and networking . 2015,第1期

机译：分布式Web爬虫的对象体系结构设计和有效的动态调整机制
4. Design and Implementation of a Scalable Distributed Web Crawler Based on Hadoop [C] . YuLiang Shi, Ti Zhang International Conference on Big Data Analysis . 2017

机译：基于Hadoop的可扩展分布式Web爬网履带的设计与实现
5. Constructing Web Crawlers for the World Art Dynamics Technology Platform [D] . Guo, Xueyuan. 2019

机译：为世界艺术动力学技术平台构建网络爬虫
6. Cloudwave: Distributed Processing of Big Data from Electrophysiological Recordings for Epilepsy Clinical Research Using Hadoop [O] . Catherine P. Jayapandian, Chien-Hung Chen, Alireza Bozorgi, 2013

机译：Cloudwave：使用Hadoop进行癫痫临床研究的电生理记录中的大数据分布式处理
7. Analysis and Research of Distributed network Crawler based on Cloud Computing Hadoop Platform [O] . Hongsheng Xu, Ganglong Fan, Ke Li 2018

机译：基于云计算Hadoop平台的分布式网络履带分析与研究

Distributed Web Crawlers using Hadoop

摘要

著录项

相似文献

相关主题

期刊订阅