首页>
外国专利>
Efficient storage mechanism for representing term occurrence in unstructured text documents
Efficient storage mechanism for representing term occurrence in unstructured text documents
展开▼
机译:表示非结构化文本文档中术语出现的有效存储机制
展开▼
页面导航
摘要
著录项
相似文献
摘要
A method and structure converts a document corpus containing an ordered plurality of documents into a compact representation in memory of occurrence data, where the representation is to be based on a dictionary previously developed for the document corpus and where each term in the dictionary has associated therewith a corresponding unique integer. The method includes developing a first vector for the entire document corpus, the first vector being a sequential listing of the unique integers such that each document in the document corpus is sequentially represented in the listing according to the occurrence in the document of the corresponding dictionary terms. A second vector is also developed for the entire document corpus and indicates the location of each of the document's representation in the first vector.
展开▼