Approximating Document Frequency for Self-Index based Top-k Document Retrieval

机译：基于自索引的Top-k文档检索的近似文档频率

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Top-k document retrieval, which returns highly relevant documents relative to a query, is an essential task for many applications. One of the promising index frameworks is built by FM-index and wavelet tree for supporting efficient top-k document retrieval. The index, however, has difficulty on handling document frequency (DF) at search time because indexed terms are all substrings of a document collection. Previous works exhaustively search all the parts of the index, where most of the documents are not relevant, for DF calculation or store recalculated DF values in huge additional space. In this paper, we propose two methods to approximate DF of a query term by exploiting the information obtained from the process of traversing the index structures. Experimental results showed that our methods achieved almost equal effectiveness of exhaustive search while keeping search efficiency that time of our methods are about a half of the exhaustive search.

机译：Top-k文档检索（返回与查询相关的文档高度相关）是许多应用程序的基本任务。 FM-index和小波树构建了一种有前途的索引框架，用于支持高效的top-k文档检索。但是，索引很难在搜索时处理文档频率（DF），因为索引词是文档集合的所有子字符串。以前的工作详尽地搜索了索引中所有与文档无关的所有部分，以进行DF计算或将重新计算的DF值存储在巨大的额外空间中。在本文中，我们提出了两种方法，可以利用从遍历索引结构的过程中获得的信息来近似查询词的DF。实验结果表明，我们的方法达到了穷举搜索的几乎相等的效果，同时保持了搜索效率，即我们的方法的时间约为穷举搜索的一半。

著录项

来源
《IEEE International Conference on Advanced Information Networking and Applications》|2015年|541-546|共6页
会议地点
作者
Suzuki Tokinori; Fujii Atsushi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
FM-index; approximate search; wavelet tree;

机译：FM索引;近似搜索;小波树;

相似文献

外文文献
中文文献
专利

1. The Quantile Index - Succinct Self-Index for Top-k Document Retrieval [J] . Niklas Baumstark, Simon Gog, Tobias Heuer, LIPIcs : Leibniz International Proceedings in Informatics . 2017,第4期

机译：分位数索引-用于Top-k文档检索的简洁自索引
2. Efficient Top-k Document Retrieval for Long Queries Using Term-Document Binary Matrix — Pursuit of Enhanced Informational Search on the Web — [J] . Etsuro FUJITA, Keizo OYAMA IEICE transactions on information and systems . 2013,第5期

机译：使用术语文档二进制矩阵对长查询进行有效的Top-k文档检索-追求增强的Web信息搜索能力-
3. Efficient Top-k Document Retrieval for Long Queries Using Term-Document Binary Matrix: Pursuit of Enhanced Informational Search on the Web [J] . Etsuro Fujita, Keizo Oyama IEICE Transactions on Information and Systems . 2013,第5期

机译：使用术语文档二进制矩阵对长查询进行有效的Top-k文档检索：追求增强的Web信息搜索
4. Approximating Document Frequency for Self-Index based Top-k Document Retrieval [C] . Tokinori Suzuki, Atsushi Fujii IEEE International Conference on Advanced Information Networking and Applications Workshops . 2015

机译：基于自索引的Top-K文档检索的近似文档频率
5. Content-based handwritten document indexing and retrieval. [D] . Huang, Chen. 2008

机译：基于内容的手写文档索引和检索。
6. A framework for biomedical figure segmentation towards image-based document retrieval [O] . Luis D Lopez, Jingyi Yu, Cecilia Arighi, 2013

机译：用于基于图像的文档检索的生物医学图形分割框架
7. The Quantile Index - Succinct Self-Index for Top-k Document Retrieval [O] . Baumstark Niklas, Gog Simon, Heuer Tobias, 2017

机译：分位数指数 - Top-k文献检索的简洁自我索引
8. Information Storage and Retrieval. Document Retrieval Based on Clustered Files. [R] . murray,daniel mcclure 1972

机译：信息存储和检索。基于聚类文件的文档检索。

Approximating Document Frequency for Self-Index based Top-k Document Retrieval

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅