首页> 外文OA文献 >Efficient Index-based Snippet Generation
【2h】

Efficient Index-based Snippet Generation

机译:高效的基于索引的代码段生成

摘要

Ranked result lists with query-dependent snippets have become state of the art in text search. They are typically implemented by searching, at query time, for occurrences of the query words in the top-ranked documents. This emph{document-based} approach has three inherent problems: (i) when a document is indexed by terms which it does not contain literally (e.g., related words or spelling variants), localization of the corresponding snippets becomes problematic; (ii) each query operator (e.g., phrase or proximity search) has to be implemented twice, on the index side in order to compute the correct result set, and on the snippet generation side to generate the appropriate snippets; and (iii) in a worst case, the whole document needs to be scanned for occurrences of the query words, which is problematic for very long documents. We present a new emph{index-based} method that localizes snippets by information solely computed from the index, and that overcomes all three problems. Unlike previous index-based methods, we show how to achieve this at essentially no extra cost in query processing time, by a technique we call emph{query rotation}. We also show how our index-based method allows the caching of individual segments instead of complete documents, which enables a significantly larger cache hit ratio as compared to the document-based approach. We have fully integrated our implementation with the CompleteSearch engine.
机译:具有查询相关摘录的排名结果列表已成为文本搜索的最新技术。它们通常是通过在查询时搜索排名最高的文档中查询词的出现来实现的。这种 emph {基于文档的}方法存在三个固有的问题:(i)当文档使用不包含字面意义的术语(例如,相关单词或拼写变体)进行索引时,相应代码段的本地化就成为问题; (ii)每个查询运算符(例如词组搜索或邻近搜索)必须在索引端执行两次,以计算正确的结果集,然后在代码段生成端执行两次,以生成适当的代码段; (iii)在最坏的情况下,需要对整个文档进行扫描以查询查询词的出现,这对于很长的文档是有问题的。我们提出了一种新的 emph {index-based}方法,该方法通过仅根据索引计算出的信息来对摘要进行定位,从而克服了所有三个问题。与以前的基于索引的方法不同,我们展示了如何通过一种称为 emph {query rotation}的技术,在查询处理时间上基本免费地实现这一目标。我们还展示了基于索引的方法如何允许对单个段而不是完整文档进行缓存,与基于文档的方法相比,这可以显着提高缓存命中率。我们已将我们的实施与CompleteSearch引擎完全集成在一起。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号