【24h】

Understanding Web query interfaces

机译:了解Web查询界面

获取原文

摘要

Recently, the Web has been rapidly "deepened" by many searchable databases online, where data are hidden behind query forms. For modelling and integrating Web databases, the very first challenge is to understand what a query interface says- or what query capabilities a source supports. Such automatic extraction of interface semantics is challenging, as query forms are created autonomously. Our approach builds on the observation that, across myriad sources, query forms seem to reveal some "concerted structure," by sharing common building blocks. Toward this insight, we hypothesize the existence of a hidden syntax that guides the creation of query interfaces, albeit from different sources. This hypothesis effectively transforms query interfaces into a visual language with a non-prescribed grammar- and, thus, their semantic understanding a parsing problem. Such a paradigm enables principled solutions for both declaratively representing common patterns, by a derivedgrammar, and systematically interpreting query forms, by a global parsing mechanism. To realize this paradigm, we must address the challenges of a hypothetical syntax- that it is to be derived, and that it is secondary to the input. At the heart of our form extractor, we thus develop a 2P grammar and a best-effort parser, which together realize a parsing mechanism for a hypothetical syntax. Our experiments show the promise of this approach-it achieves above 85% accuracy for extracting query conditions across random sources.
机译:最近,许多在线可搜索数据库迅速“深化”了Web,其中数据隐藏在查询表单的后面。对于建模和集成Web数据库,第一个挑战是要了解查询界面所说的内容或源所支持的查询功能。由于查询表单是自动创建的,因此这种界面语义的自动提取具有挑战性。我们的方法建立在以下观察的基础上:在各种来源中,查询表单似乎通过共享通用的构建基块而揭示出一些“一致的结构”。为了获得这种见解,我们假设存在隐藏语法,该语法可以指导查询接口的创建,尽管它来自不同的来源。该假设有效地将查询接口转换为具有非规定语法的可视语言,从而使它们的语义理解为解析问题。这样的范式实现了原则性的解决方案,既可以通过派生语法声明性地表示通用模式,又可以通过全局解析机制来系统地解释查询形式。为了实现这种范式,我们必须应对假设语法的挑战,即它是派生的,并且它是输入的次要条件。因此,在表单提取器的核心,我们开发了 2P语法尽力而为解析器,它们共同实现了一种假设语法的解析机制。我们的实验表明了这种方法的希望-它可以在跨随机源提取查询条件时实现85%以上的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号