【24h】

Efficient IR-Style Keyword Search over Relational Databases

机译:关系数据库上的高效IR样式关键字搜索

获取原文
获取原文并翻译 | 示例

摘要

Applications in which plain text coexists with structured data are pervasive. Commercial relational database management systems (RDBMSs) generally provide querying capabilities for text attributes that incorporate state-of-the-art information retrieval (IR) relevance ranking strategies, but this search functionality requires that queries specify the exact column or columns against which a given list of keywords is to be matched. This requirement can be cumbersome and inflexible from a user perspective: good answers to a keyword query might need to be "assembled" -in perhaps unforeseen ways- by joining tuples from multiple relations. This observation has motivated recent research on free-form keyword search over RDBMSs. In this paper, we adapt IR-style document-relevance ranking strategies to the problem of processing free-form keyword queries over RDBMSs. Our query model can handle queries with both AND and OR semantics, and exploits the sophisticated single-column text-search functionality often available in commercial RDBMSs. We develop query-processing strategies that build on a crucial characteristic of IR-style keyword search: only the few most relevant matches -according to some definition of "relevance" - are generally of interest. Consequently, rather than computing all matches for a keyword query, which leads to inefficient executions, our techniques focus on the top-k matches for the query, for moderate values of k. A thorough experimental evaluation over real data shows the performance advantages of our approach.
机译:纯文本与结构化数据共存的应用程序无处不在。商业关系数据库管理系统(RDBMS)通常提供结合了最新信息检索(IR)相关性排名策略的文本属性的查询功能,但是此搜索功能要求查询指定给定的确切列关键字列表要匹配。从用户的角度来看,此要求可能很繁琐且不灵活:可能需要通过合并来自多个关系的元组,以可能无法预料的方式“组合”关键字查询的良好答案。该观察结果激发了对基于RDBMS的自由格式关键字搜索的最新研究。在本文中,我们将IR样式的文档相关性排序策略应用于处理RDBMS上的自由格式关键字查询的问题。我们的查询模型可以处理具有AND和OR语义的查询,并利用商业RDBMS中通常提供的复杂的单列文本搜索功能。我们开发基于IR样式关键字搜索的关键特征的查询处理策略:通常只关注少数几个最相关的匹配项(根据“相关性”的某些定义)。因此,我们的技术不是针对关键字查询计算所有匹配项(这会导致执行效率低下),而是针对中等k值针对查询的前k个匹配项。对真实数据进行的全面实验评估表明了我们方法的性能优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号