...
首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Prequery Discovery of Domain-Specific Query Forms: A Survey
【24h】

Prequery Discovery of Domain-Specific Query Forms: A Survey

机译:特定于域的查询形式的预查询发现:一项调查

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The discovery of HTML query forms is one of the main challenges in Deep web crawling. Automatic solutions for this problem perform two main tasks. The first is locating HTML forms on the web, which is done through the use of traditional/focused crawlers. The second is identifying which of these forms are indeed meant for querying, which also typically involves determining a domain for the underlying data source (and thus for the form as well). This problem has attracted a great deal of interest, resulting in a long list of algorithms and techniques. Some methods submit requests through the forms and then analyze the data retrieved in response, typically requiring a great deal of knowledge about the domain as well as semantic processing. Others do not employ form submission, to avoid such difficulties, although some techniques rely to some extent on semantics and domain knowledge. This survey gives an up-to-date review of methods for the discovery of domain-specific query forms that do not involve form submission. We detail these methods and discuss how form discovery has become increasingly more automated over time. We conclude with a forecast of what we believe are the immediate next steps in this trend.
机译:HTML查询表单的发现是深层网络爬网的主要挑战之一。针对此问题的自动解决方案执行两个主要任务。首先是在Web上定位HTML表单,这是通过使用传统/重点突出的搜寻器来完成的。第二个步骤是确定这些形式中的哪一个确实是要查询的,这通常还涉及确定基础数据源的域(因此也要确定表单的域)。这个问题引起了人们的极大兴趣,从而产生了大量算法和技术。一些方法通过表单提交请求,然后分析作为响应而检索到的数据,通常需要大量有关域以及语义处理的知识。为了避免此类困难,其他一些人不使用表单提交,尽管某些技术在某种程度上依赖于语义和领域知识。该调查对不涉及表单提交的特定于域的查询表单的发现方法进行了最新回顾。我们将详细介绍这些方法,并讨论随着时间的推移,表单发现如何变得越来越自动化。我们以对我们认为是这一趋势的紧迫下一步行动的预测作为结束。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号