Data structures for information retrieval

机译：信息检索的数据结构

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The process of efficiently indexing large document collections for information retrieval places large demands on a computer's memory and processor, and requires judicious use of these resources. In this paper, we describe our approach to constructing such an index based on the vector-space model (VSM). We review the stages involved in generating an index, for weighting the index terms, and for representing documents in the VSM. We explain our choice of data structures from the parsing of the document collection through the generation of index terms, to generation of document representations. We explain tradeoffs in our choice of data structures. We then demonstrate the approach using the OHSUMED data set. Our results show that even with only a modest amount of main memory (4 GB), large data sets such as the OHSUMED data set can be quickly indexed.

机译：有效索引大型文档集合以进行信息检索的过程对计算机的内存和处理器提出了很高的要求，并且需要明智地使用这些资源。在本文中，我们描述了基于向量空间模型（VSM）构建此类索引的方法。我们回顾了生成索引，加权索引术语以及在VSM中表示文档所涉及的阶段。我们解释了数据结构的选择，从文档集合的解析到索引词的生成，再到文档表示的生成。我们在选择数据结构时说明了权衡取舍。然后，我们使用OHSUMED数据集演示该方法。我们的结果表明，即使仅使用少量的主内存（4 GB），大型数据集（例如OHSUMED数据集）也可以快速建立索引。

著录项

来源
《IST-Africa Conference Exhibition》|2014年|1-8|共8页
会议地点
作者
Nkweteyim Denis L.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Computational modeling; Data structures; Dictionaries; Indexes; Random access memory; Vectors; Information retrieval; binary search tree; data structures; dictionary; index; linked list; posting; term frequency; vector-space model;

机译：计算建模;数据结构;字典;索引;随机存取存储器;向量;信息检索;二进制搜索树;数据结构;字典;指数;链表;发布;词频向量空间模型;
入库时间 2022-08-26 15:00:16

相似文献

外文文献
中文文献
专利

1. How to Use Relational Databases Data retrieval with structured query language [J] . Diane Dolezel Journal of AHIMA . 2015,第11期

机译：如何使用带有结构化查询语言的关系数据库数据检索
2. Collection and retrieval of structured clinical data from electronic patient records in general practice. A first-phase study to create a health care database for research and quality assessment. [J] . Mansson J, Nilsson G, Bjorkelund C, Scandinavian journal of primary health care. . 2004,第1期

机译：一般情况下，从电子患者记录中收集和检索结构化的临床数据。为创建用于研究和质量评估的医疗保健数据库而进行的第一阶段研究。
3. Collection and retrieval of structured clinical data from electronic patient records in general practice A first-phase study to create a health care database for research and quality assessment [J] . Scandinavian journal of primary health care. . 2004,第1期

机译：在一般实践中从电子患者记录中收集和检索结构化的临床数据进行第一阶段研究，以创建用于研究和质量评估的医疗保健数据库
4. Synthetic retrieval technology for structured data and Non-structured data [C] . Zhaoshun Wang, Guicheng Shen, Jinjin Huang The 2nd International Conference on Information Science and Engineering . 2010

机译：结构化数据和非结构化数据的综合检索技术
5. Dynamic data structures for geometric search and retrieval. [D] . Park, Eunhui. 2013

机译：用于几何搜索和检索的动态数据结构。
6. Query expansion using MeSH terms for dataset retrieval: OHSU at the bioCADDIE 2016 dataset retrieval challenge [O] . Theodore B Wright, David Ball, William Hersh 2017

机译：使用MeSH术语进行数据集检索的查询扩展：OHSU在bioCADDIE 2016数据集检索挑战中
7. Retrieval of Structured and Unstructured Data with vitrivr [O] . Luca Rossetto, Ralph Gasser, Silvan Heller, 2019

机译：使用Vitrivr检索结构化和非结构化数据
8. Formalizing structured file services for the data storage and retrieval subsystem of the data management system for Spacestation Freedom [R] . Jamsek, Damir A. 1993

机译：为spacestation Freedom的数据管理系统的数据存储和检索子系统形式化结构化文件服务

Data structures for information retrieval

摘要

著录项

相似文献

相关主题

期刊订阅