HiCrawl: A Hidden Web Crawler for Medical Domain

机译：HiCrawl：用于医疗领域的隐藏的网络爬虫

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The Hidden Web refers to a huge portion of the WWW that holds numerous freely accessible Web databases, hidden behind search form interfaces which can only be accessed through dynamic web pages that are generated in response to the user queries issued at the search form interface. Thus, the core challenge to implement any crawler for the Hidden Web is to routinely surpass these search form interfaces by automatically generating & issuing queries that help discover these dynamic Web pages. The paper provides a novel approach to guide the crawler in choosing the right query term to be submitted to any search form interface that has been designed to accept keywords or terms as input to it. The system is based on the use of classification hierarchies that might have either been manually or automatically constructed. And for the purposes of illustration, we have considered the search form interfaces in the 'Medical' domain, it being one of the most popular domains used by the researchers and the use of a manually generated top-down classification hierarchy in the same domain.

机译：隐藏的Web是指WWW的很大一部分，其中包含许多可自由访问的Web数据库，这些数据库隐藏在搜索表单界面的后面，这些表单只能通过响应于在搜索表单界面上发出的用户查询而生成的动态网页进行访问。因此，为隐藏Web实施任何爬网程序的核心挑战是通过自动生成和发布有助于发现这些动态Web页面的查询来常规地超越这些搜索表单界面。本文提供了一种新颖的方法来指导爬虫选择正确的查询词，以将其提交给已设计为接受关键字或词作为其输入的任何搜索表单界面。该系统基于分类层次结构的使用，这些分类层次结构可能是手动构建的，也可能是自动构建的。出于说明目的，我们考虑了“医疗”域中的搜索表单界面，它是研究人员使用的最受欢迎的域之一，并且在同一域中使用手动生成的自上而下的分类层次结构。

著录项

来源
《International Symposium on Computational and Business Intelligence》|2013年|152-157|共6页
会议地点
作者
Gupta S.; Bhatia K.K.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Content Retrieval; Hidden Web; Surface Web; WWW; automatic form filling; crawlers; form processing;

机译：内容检索;隐藏Web;表面Web; WWW;自动表单填充;爬网程序;表单处理;

相似文献

外文文献
中文文献
专利

1. Design of a Least Cost (LC) Vertical Search Engine based on Domain Specific Hidden Web Crawler [J] . Sudhakar Ranjan, Komal Kumar Bhatia International journal of information retrieval research . 2017,第2期

机译：基于特定于域的隐藏Web爬虫的最低成本（LC）垂直搜索引擎的设计
2. Extraction of Query Interfaces for Domain-Specific Hidden Web Crawler [J] . Nupur Gupta International journal of computer science and network security . 2016,第2期

机译：特定于域的隐藏Web爬网程序的查询接口的提取
3. A Novel Design of Hidden Web Crawler using Ontology [J] . Manvi, Komal Kumar Bhatia, Ashutosh Dixit International Journal of Engineering Trends and Technology . 2015,第1期

机译：基于本体的隐藏式网络爬虫的新颖设计
4. HiCrawl: A Hidden Web Crawler for Medical Domain [C] . Gupta S., Bhatia K.K. International Symposium on Computational and Business Intelligence . 2013

机译：HiCrawl：医疗领域隐藏的Web爬网
5. Constructing Web Crawlers for the World Art Dynamics Technology Platform [D] . Guo, Xueyuan. 2019

机译：为世界艺术动力学技术平台构建网络爬虫
6. SEE: structured representation of scientific evidence in the biomedical domain using Semantic Web techniques [O] . Christian Bölling, Michael Weidlich, Hermann-Georg Holzhütter 2014

机译：SEE：使用语义网技术在生物医学领域中对科学证据进行结构化表示
7. A novel design of hidden web crawler using ontology [O] . Manvi, Bhatia, Komal Kumar, Dixit, Ashutosh 2015

机译：一种基于本体的隐藏网络爬虫的新颖设计

HiCrawl: A Hidden Web Crawler for Medical Domain

摘要

著录项

相似文献

相关主题

期刊订阅