首页> 外国专利> Two-level n-gram index structure and methods of index building, query processing and index derivation

Two-level n-gram index structure and methods of index building, query processing and index derivation

机译:二级n元语法索引结构以及索引构建,查询处理和索引推导的方法

摘要

Disclosed relates to a structure of two-level n-gram inverted index and methods of building the same, processing queries and deriving the index that reduce the size of n-gram inverted index and improves the query performance by eliminating the redundancy of the position information that exists in the n-gram inverted index.;The inverted index of the present invention comprises a back-end inverted index using subsequences extracted from documents as a term and a front-end inverted index using n-grams extracted from the subsequences as a term. The back-end inverted index uses the subsequences of a specific length extracted from the documents to be overlapped with each other by n−1 (n: the length of n-gram) as a term and stores position information of the subsequences occurring in the documents in a posting list for the respective subsequences. The front-end inverted index uses the n-grams of a specific length extracted from the subsequences using a 1-sliding technique as a term and stores position information of the n-grams occurring in the subsequences in a posting list for the respective n-grams.
机译:公开了一种二级n-gram倒排索引的结构及其构建,处理查询和导出索引的方法,该方法减少了n-gram倒排索引的大小并通过消除位置信息的冗余来提高查询性能。本发明的反向索引包括使用从文档中提取的子序列作为项的后端反向索引和使用从子序列中提取的n-语法子的前端反向索引。术语。后端反向索引使用从文档中提取的特定长度的子序列相互重叠n-1(n:n-gram的长度)作为术语,并存储在文档中出现的子序列的位置信息。各个子序列的过帐清单中的文档。前端倒排索引使用使用1-sliding技术从子序列中提取的特定长度的n-gram作为术语,并将在子序列中出现的n-gram的位置信息存储在各个n-的发布列表中克。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号