首页> 外国专利> device and method for indexing of documents according to a transmitted n - gram wortzerlegung

device and method for indexing of documents according to a transmitted n - gram wortzerlegung

机译:根据传输的n克文献索引文档的装置和方法;

摘要

A system and method provides for indexing and retrieval of stored documents using a decomposition of words in the documents in n-grams, or linear word subunits. The documents are indexed as pages in a number of banks. For each bank there is a bank index. The individual n-grams are identified for each page are stored in the bank index. Each bank index further contains an entry map that indicates whether a given n-gram is present in any of the pages of the bank, and then provides an index to a page map that further indicates which page in the bank contains the n-gram. When a search query is input, the query words are decomposed into their n-grams. The query word n-grams are compared first with entry maps to determine if the query word n-grams appear on any page in the bank. If so, the associated page map is traversed to determine which page in the bank contains the query word n-grams. The n-grams on the page are compared with the query word n-grams to determine the presence of an match therebetween. Matching pages are flagged. When all pages in all banks have been processed, the pages are consolidated with respect to the documents to which they belong, resulting in a list of documents that match the search query. The results are displayed to a user.
机译:一种系统和方法提供了使用n-gram或线性单词子单元中文档中单词的分解来索引和检索存储的文档的功能。这些文件在许多银行中均以页面索引。每个银行都有一个银行指数。为每页识别的单个n-gram被存储在bank索引中。每个存储体索引还包含一个条目映射,用于指示在存储体的任何页面中是否存在给定的n-gram,然后为该页面映射提供索引,以进一步指示存储体中的哪个页面包含n-gram。当输入搜索查询时,查询词被分解为它们的n-gram。首先将查询词n-gram与条目映射进行比较,以确定查询词n-gram是否出现在库中的任何页面上。如果是这样,则遍历关联的页面映射以确定存储库中的哪个页面包含查询词n-gram。将页面上的n-gram与查询词n-gram进行比较,以确定它们之间是否存在匹配项。匹配页面被标记。处理完所有库中的所有页面后,将根据页面所属的文档进行合并,从而生成与搜索查询匹配的文档列表。结果显示给用户。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号