Architecture specification of rule-based deep web crawler with indexer

S.G. Shaila; A. Vadivel

首页> 外文期刊>International journal of knowledge and web intelligence >Architecture specification of rule-based deep web crawler with indexer

【24h】

Architecture specification of rule-based deep web crawler with indexer

机译：具有索引器的基于规则的深层网络爬虫的体系结构规范

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Suitable architecture specification of a deep web crawler with surface web crawler as well as indexer is proposed for fetching large number of documents from deep web using rules. The functional dependency of core and allied fields in the FORM are identified for generating rules using SVM classifier and classifies them as most preferable, least preferable and mutually exclusive. The FORMs are filled with values from most preferable class for fetching large number of documents. The extracted document is indexed for information retrieval applications. The architecture is extended to distributed crawler using web services. The proposed crawler fetches large number of documents while using the values in most preferable class. This architecture has higher coverage rate and reduces fetching time. The retrieval performance is encouraging and achieves similar precision of retrieval as Google search engine system.

机译：提出了具有表面Web爬网程序以及索引器的深层Web爬网程序的适当体系结构规范，用于使用规则从深层Web提取大量文档。识别FORM中核心和相关字段的功能依赖性，以使用SVM分类器生成规则，并将它们分类为最优选，最不优选和互斥。 FORM中填充了来自最优选类的值，以获取大量文档。提取的文档将为信息检索应用程序编制索引。该体系结构已扩展为使用Web服务的分布式搜寻器。提出的搜寻器在使用最优选类中的值的同时获取了大量文档。这种架构具有更高的覆盖率，并减少了获取时间。检索性能令人鼓舞，并且获得与Google搜索引擎系统相似的检索精度。

著录项

来源
《International journal of knowledge and web intelligence》 |2013年第3期|166-186|共21页
作者
S.G. Shaila; A. Vadivel;
展开▼
作者单位

Department of Computer Applications, Multimedia Information Retrieval Group, National Institute of Technology, Tiruchirappalli 620 015, India;

Department of Computer Applications, Multimedia Information Retrieval Group, National Institute of Technology, Tiruchirappalli 620 015, India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
information retrieval; surface web; deep web crawler; rules; web services; indexer; distributed crawler;

机译：信息检索;表面网深层网络爬虫;规则网页服务;索引器分布式爬虫;

相似文献

外文文献
中文文献
专利

1. A Novel Architecture for Deep Web Crawler [J] . Dilip Kumar Sharma, Shobhit University, Meerut India International journal of information technology and web engineering . 2011,第1期

机译：面向深层网络爬虫的新颖架构
2. Advancement in Web Indexing through Web crawlers [J] . Akarsh Gupta, Monu Rana, Saurabh Teotia, International Journal of Engineering Research and Applications . 2020,第4期

机译：通过Web爬行器推进Web索引
3. Web Crawler for Indexing Video e-Learning Resources: A YouTube Case Study [J] . Bogdan IANCU Informatica Economica . 2019,第2期

机译：用于将视频电子学习资源编入索引的网络爬虫：YouTube案例研究
4. A New Architecture of an Intelligent Agent-Based Crawler for Domain-Specific Deep Web Databases [C] . Li Yanni, Wang Yuping, Tian Erfeng IEEE/WIC/ACM International Conference on Web Intelligence;WI 2012;IAT 2012;IEEE/WIC/ACM International Conference on Intelligent Agent Technology;ODMWI 2012;International Workshop on Optimization-based Data Mining and Web Intelligence;BI 2012;International Workshop on Behavior Informatics;IWI-2012;TF'12;International Workshop on Intelligent Web Interaction;NLPOE 2012;International Workshop on Tourism Facilities;NiCaM-WI 2012;WPRSM 2012;International Workshop on Natural Language Processing and Ontology Engineering;WIRSS;International Workshop on Nature-Inspired Computing and Metaheuristics for Web Intelligence;WISS 2012;International Workshop on Web Personalization, Recommender Systems and Social Media;International Workshop on Web Information Retrieval Support Systems;International Symposium on Web Intelligent Systems Services;Combined Workshop on Cross-Cultural and Cross-Linguistic Semantic Web and Software Agent Teamwork for the Semantic Web;International Workshop on Social Networks and Data Processing;SNDP 2012;International Symposium on the Intelligent Campus;IC'12;International Workshop on Green Computing and Sustainable Society;GCSS . 2012

机译：用于特定于域的深层Web数据库的基于智能代理的爬网程序的新体系结构
5. Web crawler indexing: An approach by clustering. [D] . Menon, Dhanya C. 2004

机译：Web爬网程序索引编制：一种通过聚类的方法。
6. A user-oriented web crawler for selectively acquiring online content in e-health research [O] . Songhua Xu, Hong-Jun Yoon, Georgia Tourassi -1

机译：面向用户的网络爬虫用于在电子卫生研究中选择性地获取在线内容
7. Implementasi Directed Acyclic Word Graph Dengan Menggunakan Algoritma Blow the Bridge Pada Web Crawler Untuk Indexing Web [O] . Raharjanto, Santosa, Susanto, Budi, Santosa, Raden Gunawan 2007

机译：Web爬网程序中使用Blow Bridge算法实现有向无环字图的Web索引编制

Architecture specification of rule-based deep web crawler with indexer

摘要

著录项

相似文献

相关主题

期刊订阅