首页> 外国专利> Efficient storage mechanism for representing term occurrence in unstructured text documents

Efficient storage mechanism for representing term occurrence in unstructured text documents

机译:表示非结构化文本文档中术语出现的有效存储机制

摘要

A method and structure converts a document corpus containing an ordered plurality of documents into a compact representation in memory of occurrence data, where the representation is to be based on a dictionary previously developed for the document corpus and where each term in the dictionary has associated therewith a corresponding unique integer. The method includes developing a first vector for the entire document corpus, the first vector being a sequential listing of the unique integers such that each document in the document corpus is sequentially represented in the listing according to the occurrence in the document of the corresponding dictionary terms. A second vector is also developed for the entire document corpus and indicates the location of each of the document's representation in the first vector.
机译:一种方法和结构将包含排序的多个文档的文档语料库转换为出现数据存储器中的紧凑表示,其中该表示将基于先前为该文档语料库开发的词典,并且词典中的每个术语都与之相关联相应的唯一整数。该方法包括为整个文档语料库开发一个第一向量,该第一向量是唯一整数的顺序列表,以使文档语料库中的每个文档根据相应字典术语在文档中的出现顺序在列表中表示。 。还为整个文档语料库开发了第二个向量,并指示了每个文档表示形式在第一个向量中的位置。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号