首页> 外国专利> SYSTEM AND METHOD FOR PORTABLE DOCUMENT INDEXING USING N-GRAM WORD DECOMPOSITION

SYSTEM AND METHOD FOR PORTABLE DOCUMENT INDEXING USING N-GRAM WORD DECOMPOSITION

机译:利用n-gram字分解的便携式文件索引系统和方法

摘要

A system and method provides for indexing andretrieval of stored documents using a decompositionof words in the documents in n-grams, or linear wordsubunits. The documents are indexed as pages ina number of banks. For each bank there is a bankindex. The individual n-greens are identified for eachpage and are stored is the bank index. Each bankindex further contains an entry map that indicateswhether a given n-gram is present in any of thepages of the bank, and then provides an index to apage map that further indicates which page in thebank contains the n-gram. When a search query isinput, the query words are decomposed into theirn-grams. The query word n-grams are compared firstwith entry maps to determine if the query wordn-grams appear on any page in the bank. If so, theassociated page map is traversed to determine whichpage in the bank contains the query word n-grams.The n-grams on the page are compared with the queryword n-grams to determine the presence of a matchtherebetween. Matching pages are flagged. When allpages in all blanks have been processed, the pagesare consolidated with respect to the documents towhich they belong, resulting in a list of documentsthat match the search query. The results are displayedto a user.
机译:提供用于索引和索引的系统和方法使用分解来检索存储的文档文档中以n-gram表示的单词或线性单词亚单位。这些文档被索引为页面多家银行。每家银行都有一家银行指数。为每个单独的n-green进行标识页并存储的是银行索引。每家银行索引还包含一个入口映射,该入口映射指示给定的n-gram是否存在于任何银行的页面,然后提供一个索引页面地图,该地图进一步指示了bank包含n-gram。当搜索查询为输入,将查询词分解成它们的克首先比较查询词n-gram与入口映射确定查询词是否n-gram出现在银行的任何页面上。如果是这样,遍历关联的页面映射以确定哪个银行中的页面包含查询词n-gram。将页面上的n-gram与查询进行比较单词n-gram来确定是否存在匹配项之间。匹配页面被标记。当全部所有空白中的页面均已处理,页面关于文件合并到属于它们,从而产生文档列表与搜索查询匹配。结果显示给用户。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号