首页> 外国专利> TWO-LEVEL N-GRAM INDEX STRUCTURE, METHOD OF BUILDING INDEX, METHOD OF PROCESSING QUERY, AND METHOD OF DERIVING INDEX

TWO-LEVEL N-GRAM INDEX STRUCTURE, METHOD OF BUILDING INDEX, METHOD OF PROCESSING QUERY, AND METHOD OF DERIVING INDEX

机译:两层N-GRAM索引结构,构建索引的方法,处理查询的方法以及推导索引的方法

摘要

Disclosed relates to a structure of two-level n-gram inverted index and methods of building the same, processing queries and deriving the index that reduce the size of n-gram inverted index and improves the query performance by eliminating the redundancy of the position information that exists in the n-gram inverted index. The inverted index of the present invention comprises a back-end inverted index using subsequences extracted from documents as a term and a front-end inverted index using n-grams extracted from the subsequences as a term. The back-end inverted index uses the subsequences of a specific length extracted from the documents to be overlapped with each other by n1 (n: the length of n-gram) as a term and stores position information of the subsequences occurring in the documents in a posting list for the respective subsequences. The front-end inverted index uses the n-grams of a specific length extracted from the subsequences using a 1-sliding technique as a term and stores position information of the n-grams occurring in the subsequences in a posting list for the respective n-grams.
机译:公开了一种二级n-gram倒排索引的结构及其构建,处理查询和导出索引的方法,该方法减少了n-gram倒排索引的大小并通过消除位置信息的冗余来提高查询性能。存在于n元语法倒排索引中。本发明的倒排索引包括使用从文档中提取的子序列作为项的后端倒排索引和使用从子序列中提取的n-gram作为项的前端倒排索引。后端倒排索引使用从文档中提取的特定长度的子序列相互重叠n1(n:n-gram的长度)作为术语,并将文档中出现的子序列的位置信息存储在各个子序列的过帐清单。前端倒排索引使用使用1-sliding技术从子序列中提取的特定长度的n-gram作为术语,并将在子序列中出现的n-gram的位置信息存储在各个n-的发布列表中克。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号