Space-Efficient Data Structures for Flexible Text Retrieval Systems

机译：用于灵活文本检索系统的空间高效数据结构

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose space-efficient data structures for text retrieval systems that have merits of both theoretical data structures like suffix trees and practical ones like inverted files. Traditional text retrieval systems use the inverted files and support ranking queries based on the tf*idf (term frequency times inverse document frequency) scores of documents that contain given keywords, which cannot be solved by using only the suffix trees. A drawback of the systems is that the scores can be computed for only predetermined keywords. We extend the data structure so that the scores can be computed for any pattern efficiently while keeping the size of the data structures moderate. The size is comparable with the text size, which is an improvement from existing methods using O(n log n) bit space for a text collection of length n.

机译：我们为文本检索系统提出了空间高效的数据结构，这些系统具有与后缀树和实际类似的理论数据结构的优点，如反相文件。传统的文本检索系统使用反转文件并根据包含给定关键字的文档的TF * IDF（术语频率次数逆文档频率）的分数来支持排名查询，这些文件只能通过仅使用后缀树来解决。系统的缺点是可以仅计算得分以仅用于预定的关键字。我们扩展数据结构，使得可以有效地计算得分，同时保持数据结构的大小中等。大小与文本大小相当，这是使用O（n log n）比特空间的现有方法的改进，用于文本N的文本n。

著录项

来源
《International Symposium on Algorithms and computation》|2002年||共11页
会议地点
作者
Kunihiko Sadakane;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP301.6-532;
关键词

相似文献

外文文献
中文文献
专利

1. Unique-order interpolative coding for fast querying and space-efficient indexing in information retrieval systems [J] . Cher-Sheng Cheng, Jean Jyh-Jiun Shann, Chung-Ping Chung Information Processing & Management . 2006,第2期

机译：信息检索系统中用于快速查询和节省空间索引的唯一顺序插值编码
2. Proposed Architecture for Automatic Conversion of Unstructured Text Data into Structured Text Data on the Web [J] . CH.Madhusudhan, K.Mrithyunjaya Rao International journal of computer science and network security . 2013,第12期

机译：在网络上将非结构化文本数据自动转换为结构化文本数据的建议体系结构
3. Full text retrieval for huge Volumes of data in patent system "PSEARCH/DB" [J] . Shinya Nakamoto NIPPON STEEL TECHNICAL REPORT . 1998,第76期

机译：全文检索专利系统“ PSEARCH / DB”中的大量数据
4. Space-Efficient Data Structures for Flexible Text Retrieval Systems [C] . Kunihiko Sadakane Algorithms and Computation . 2002

机译：灵活的文本检索系统的节省空间的数据结构
5. A scalable and flexible unstructured search system and distributed data structures for peer-to-peer networks. [D] . Choi, Tae Woong. 2010

机译：对等网络的可扩展且灵活的非结构化搜索系统和分布式数据结构。
6. Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts [O] . Yanshan Wang, Majid Rastegar-Mojarad, Ravikumar Komandur-Elayavilli, 2017

机译：利用单词嵌入和医学实体提取来使用非结构化文本检索生物医学数据集
7. Succinct data structures for flexible text retrieval systems [O] . Sadakane Kunihiko 2007

机译：简洁的数据结构，适用于灵活的文本检索系统
8. Preliminary Study to Develop a Data Acquistion System to Monitor Strains at theBottom Layers of Flexible Pavement Structures in Rhode Island. Estimation of Layer Coefficients for Design of Flexible Pavement Structures in Rhode Island [R] . Lee, K. W., Marcus, A. S., Thakur, V. M. 1994

机译：罗德岛柔性路面结构底层应变数据采集系统的初步研究。罗德岛柔性路面结构设计层系数的估算

Space-Efficient Data Structures for Flexible Text Retrieval Systems

摘要

著录项

相似文献

相关主题

期刊订阅