首页> 外文期刊>ACM Transactions on Information Systems >Efficient Index-Based Snippet Generation
【24h】

Efficient Index-Based Snippet Generation

机译:高效的基于索引的代码片段生成

获取原文
获取原文并翻译 | 示例

摘要

Ranked result lists with query-dependent snippets have become state of the art in text search. They are typically implemented by searching, at query time, for occurrences of the query words in the top-ranked documents. This document-based approach has three inherent problems: (ⅰ) when a document is indexed by terms which it does not contain literally (e.g., related words or spelling variants), localization of the corresponding snippets becomes problematic; (ⅱ) each query operator (e.g., phrase or proximity search) has to be implemented twice, on the index side in order to compute the correct result set, and on the snippet-generation side to generate the appropriate snippets; and (ⅲ) in a worst case, the whole document needs to be scanned for occurrences of the query words, which could be problematic for very long documents. We present a new index-based method that localizes snippets by information solely computed from the index and that overcomes all three problems. Unlike previous index-based methods, we show how to achieve this at essentially no extra cost in query processing time, by a technique we call operator inversion. We also show how our index-based method allows the caching of individual segments instead of complete documents, which enables a significantly larger cache hit-ratio as compared to the document-based approach. We have fully integrated our implementation with the CompleteSearch engine.
机译:具有查询相关摘录的排名结果列表已成为文本搜索的最新技术。它们通常是通过在查询时搜索排名最高的文档中查询词的出现来实现的。这种基于文档的方法存在三个固有的问题:(ⅰ)当文档被不包含字面意义的术语(例如,相关单词或拼写变体)索引时,相应摘录的本地化就会成问题; (ⅱ)每个查询运算子(例如词组搜寻或接近搜寻)必须在索引端执行两次,以计算正确的结果集,并在代码段生成端执行两次,以生成适当的代码段; (ⅲ)在最坏的情况下,需要对整个文档进行扫描以查找查询词的出现,这对于很长的文档可能会造成问题。我们提出了一种基于索引的新方法,该方法通过仅从索引中计算出的信息来对摘要进行本地化,并且可以克服所有三个问题。与以前的基于索引的方法不同,我们展示了如何通过一种称为运算符倒置的技术,在查询处理时间上基本不花费额外的成本来实现这一目标。我们还将展示基于索引的方法如何允许对单个段而不是完整文档进行缓存,与基于文档的方法相比,这可以实现更大的缓存命中率。我们已将我们的实施与CompleteSearch引擎完全集成在一起。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号