...
首页> 外文期刊>ACM Transactions on Information Systems >High-Performance Processing of Text Queries with Tunable Pruned Term and Term Pair Indexes
【24h】

High-Performance Processing of Text Queries with Tunable Pruned Term and Term Pair Indexes

机译:具有可调修剪词和词对索引的文本查询的高性能处理

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Term proximity scoring is an established means in information retrieval for improving result quality of full-text queries. Integrating such proximity scores into efficient query processing, however, has not been equally well studied. Existing methods make use of precomputed lists of documents where tuples of terms, usually pairs, occur together, usually incurring a huge index size compared to term-only indexes. This article introduces a joint framework for trading off index size and result quality, and provides optimization techniques for tuning precomputed indexes towards either maximal result quality or maximal query processing performance under controlled result quality, given an upper bound for the index size. The framework allows to selectively materialize lists for pairs based on a query log to further reduce index size. Extensive experiments with two large text collections demonstrate runtime improvements of more than one order of magnitude over existing text-based processing techniques with reasonable index sizes.
机译:术语接近度评分是信息检索中已建立的一种手段,可以提高全文查询的结果质量。但是,还没有对同等的接近度分数集成到有效的查询处理中进行很好的研究。现有方法利用预计算的文档列表,其中术语元组(通常是成对的)一起出现,与仅术语的索引相比,通常会产生巨大的索引大小。本文介绍了一种用于权衡索引大小和结果质量的联合框架,并提供了优化技术,用于在给定索引大小的上限的情况下,将预计算索引调整为最大结果质量或在受控结果质量下的最大查询处理性能。该框架允许根据查询日志选择性地实现对的列表,以进一步减小索引大小。与两个大型文本集合进行的广泛实验表明,与具有合理索引大小的现有基于文本的处理技术相比,运行时改进了一个数量级以上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号