首页> 外文会议>International conference on management of data >Sharing Work in Keyword Search over Databases
【24h】

Sharing Work in Keyword Search over Databases

机译:在关键字搜索中共享工作

获取原文

摘要

An important means of allowing non-expert end-users to pose ad hoc queries-whether over single databases or data integration systems-is through keyword search. Given a set of keywords, the query processor finds matches across different tuples and tables. It computes and executes a set of relational sub-queries whose results are combined to produce the κ highest ranking answers.Work on keyword search primarily focuses on single-database, single-query settings: each query is answered in isolation, despite possible overlap between queries posed by different users or at different times; and the number of relevant tables is assumed to be small, meaning that sub-queries can be processed without using cost-based methods to combine work. As we apply keyword search to support ad hoc data integration queries over scientific or other databases on the Web, we must reuse and combine computation. In this paper, we propose an architecture that continuously receives sets of ranked keyword queries, and seeks to reuse work across these queries. We extend multiple query optimization and continuous query techniques, and develop a new query plan scheduling module we call the ATC (based on its analogy to an air traffic controller). The ATC manages the flow of tuples among a multitude of pipelined operators, minimizing the work needed to return the top-κ answers for all queries. We also develop techniques to manage the sharing and reuse of state as queries complete and input data streams are exhausted. We show the effectiveness of our techniques in handling queries over real and synthetic data sets.
机译:允许非专家最终用户提出临时查询的重要手段 - 无论是单个数据库还是数据集成系统 - 都是通过关键字搜索。给定一组关键字,查询处理器在不同元组和表中查找匹配项。它计算并执行一组关系子查询,其结果组合以生成κ最高排名答案。在关键字搜索上的工程主要关注单个数据库,单个查询设置:每个查询都在隔离时回答,尽管可能会重叠不同用户或不同时间构成的查询;假设相关表的数量很小,这意味着可以在不使用基于成本的方法结合工作的情况下处理子查询。当我们应用关键字搜索以支持网络或网络上的其他数据库上的Ad Hoc数据集成查询,我们必须重用并组合计算。在本文中,我们提出了一种架构,该架构不断接收排名的关键字查询集,并试图在这些查询中重用工作。我们扩展了多个查询优化和连续查询技术,并开发了一个新的查询计划调度模块,我们调用ATC(基于其对空中流量控制器的类比)。 ATC管理多个流水线运营商之间的元组流,最大限度地减少返回所有查询所需的工作。我们还开发管理的技术以管理分享和重用状态,因为Queries完成和输入数据流耗尽。我们展示了我们通过实际和合成数据集处理查询的技术的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号