首页> 外文会议>ACM SIGMOD international conference on management of data >Expressive and Flexible Access to Web-Extracted Data: A Keyword-based Structured Query Language
【24h】

Expressive and Flexible Access to Web-Extracted Data: A Keyword-based Structured Query Language

机译:对Web提取数据的表现力和灵活访问:基于关键字的结构化查询语言

获取原文

摘要

Automated extraction of structured data from Web sources often leads to large heterogeneous knowledge bases (KB), with data and schema items numbering in the hundreds of thousands or millions. Formulating information needs with conventional structured query languages is difficult due to the sheer size of schema information available to the user. We address this challenge by proposing a new query language that blends keyword search with structured query processing over large information graphs with rich semantics. Our formalism for structured queries based on keywords combines the flexibility of keyword search with the expressiveness of structures queries. We propose a solution to the resulting disambiguation problem caused by introducing keywords as primitives in a structured query language. We show how expressions in our proposed language can be rewritten using the vocabulary of the web-extracted KB, and how different possible rewritings can be ranked based on their syntactic relationship to the keywords in the query as well as their semantic coherence in the underlying KB. An extensive experimental study demonstrates the efficiency and effectiveness of our approach. Additionally, we show how our query language fits into QUICK, an end-to-end information system that integrates web-extracted data graphs with full-text search. In this system, the rewritten query describes an arbitrary topic of interest for which corresponding entities, and documents relevant to the entities, are efficiently retrieved.
机译:自动提取来自Web源的结构化数据通常会导致大型异构知识库(KB),数据和架构项目编号数十万或百万。由于用户可用的架构信息的庞大大小,使用传统的结构化查询语言配制信息需求。我们通过提出使用富裕语义的大型信息图表中使用结构化查询处理的新查询语言来解决这一挑战。我们基于关键字的结构化查询的形式主义将关键字搜索的灵活性与结构查询的富有效果相结合。我们提出了通过在结构化查询语言中引入关键字作为基元引起的产生歧义问题的解决方案。我们展示了我们提出的语言中的表达式如何使用Web提取的KB的词汇重写,以及如何根据其与查询中的关键字的句法关系来排序不同的可能重写以及其基础KB中的语义相干关系。一个广泛的实验研究表明了我们方法的效率和有效性。此外,我们展示了查询语言如何快速设计,即结束信息系统,其与全文搜索集成了Web提取的数据图。在该系统中,重写的查询描述了有趣的兴趣主题,对应于实体和与实体相关的文档进行有效地检索。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号