Improved Single-Term Top-k Document Retrieval

机译：改进单学期Top-K文件检索

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

On natural language text collections, finding the k documents most relevant to a query is generally solved with inverted indexes. On general string collections, however, more sophisticated data structures are necessary. Navarro and Nekrich [SODA 2012] showed that a linear-space index can solve such top-k queries in optimal time O(m +k), where m is the query length. Konow and Navarro [DCC 2013] implemented the scheme, managing to solve top-k queries within microseconds with an index using 3.3-4.0 bytes per character (this includes the storage of the collection itself). In this paper we introduce a new implementation using significantly less space, 2.5-3.0 bytes per character (again, including the collection), and retaining similar query times. For short queries, which are the most difficult, our new index actually outperforms the previous one, as well as all the other solutions in the literature. We also show that our index can be built on very large text collections, and that it can handle phrase queries efficiently on natural language text collections. In the latter case, it uses about the same space of the tokenized text (and replaces it), while answering phrase queries an order of magnitude faster than a positional inverted index.

机译：在自然语言文本集合上，找到与查询最相关的K文档通常用反相索引解决。但是，在常规字符串集合上，需要更复杂的数据结构。 Navarro和Nekrich [Soda 2012]表明，线性空间索引可以在最佳时间O（M + k）中解决此类顶级查询，其中M是查询长度。 Konow和Navarro [DCC 2013]实现了该方案，管理旨在在微秒内解决顶级k查询，其中索引使用每字符3.3-4.0字节（这包括集合本身的存储）。在本文中，我们介绍了一个使用显着更少的空间，每字符2.5-3.0字节的新实现（再次，包括集合），并保留类似的查询时间。对于短期查询，这是最困难的，我们的新指数实际上优于前一个，以及文献中的所有其他解决方案。我们还表明，我们的索引可以在非常大的文本集合上构建，并且它可以在自然语言文本集合上有效地处理短语查询。在后一种情况下，它使用了令牌化文本的相同空间（并替换它），同时应答短语查询比位置反转索引快于位置速度。

著录项

来源
《Workshop on Algorithm Engineering and Experiments》|2015年|187 p.|共9页
会议地点
作者
Simon Gog; Gonzalo Navarro;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP301.6-53;
关键词

相似文献

外文文献
中文文献
专利

1. Efficient Top-k Document Retrieval for Long Queries Using Term-Document Binary Matrix — Pursuit of Enhanced Informational Search on the Web — [J] . Etsuro FUJITA, Keizo OYAMA IEICE transactions on information and systems . 2013,第5期

机译：使用术语文档二进制矩阵对长查询进行有效的Top-k文档检索-追求增强的Web信息搜索能力-
2. Efficient Top-k Document Retrieval for Long Queries Using Term-Document Binary Matrix: Pursuit of Enhanced Informational Search on the Web [J] . Etsuro Fujita, Keizo Oyama IEICE Transactions on Information and Systems . 2013,第5期

机译：使用术语文档二进制矩阵对长查询进行有效的Top-k文档检索：追求增强的Web信息搜索
3. Practical Compact Indexes for Top-k Document Retrieval [J] . SIMON GOG, ROBERTO KONOW, GONZALO NAVARRO Journal of experimental algorithmics . 2017,第1期

机译：适用于Top-k文档检索的实用紧凑索引
4. Improved Single-Term Top-k Document Retrieval [C] . Simon Gog, Gonzalo Navarro Workshop on Algorithm Engineering and Experiments . 2015

机译：改进单学期Top-K文件检索
5. Stepping stones and pathways: Improving retrieval by chains of relationships between documents [D] . Das Neves, Fernando Adrian 2004

机译：垫脚石和途径：通过文档之间的关系链改善检索
6. Any-k: Anytime Top-k Tree Pattern Retrieval in Labeled Graphs [O] . Xiaofeng Yang, Patrick K. Nicholson, Deepak Ajwani, -1

机译：Any-k：随时在标签图中检索前k个树型
7. Improved Single-Term Top-k Document Retrieval∗ [O] . Simon Gog, Gonzalo Navarro 2015

机译：改进的单期Top-k文档检索*
8. Using Information Extraction to Improve Document Retrieval [R] . Bear, J. , Israel, D. , Petit, J. , 1998

机译：利用信息抽取提高文献检索水平

Improved Single-Term Top-k Document Retrieval

摘要

著录项

相似文献

相关主题

期刊订阅