首页>
外国专利>
SYSTEM AND METHOD FOR PORTABLE DOCUMENT INDEXING USING N-GRAM WORD DECOMPOSITION
SYSTEM AND METHOD FOR PORTABLE DOCUMENT INDEXING USING N-GRAM WORD DECOMPOSITION
展开▼
机译:利用n-gram字分解的便携式文件索引系统和方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
A system and method provides for indexing andretrieval of stored documents using a decompositionof words in the documents in n-grams, or linear wordsubunits. The documents are indexed as pages ina number of banks. For each bank there is a bankindex. The individual n-greens are identified for eachpage and are stored is the bank index. Each bankindex further contains an entry map that indicateswhether a given n-gram is present in any of thepages of the bank, and then provides an index to apage map that further indicates which page in thebank contains the n-gram. When a search query isinput, the query words are decomposed into theirn-grams. The query word n-grams are compared firstwith entry maps to determine if the query wordn-grams appear on any page in the bank. If so, theassociated page map is traversed to determine whichpage in the bank contains the query word n-grams.The n-grams on the page are compared with the queryword n-grams to determine the presence of a matchtherebetween. Matching pages are flagged. When allpages in all blanks have been processed, the pagesare consolidated with respect to the documents towhich they belong, resulting in a list of documentsthat match the search query. The results are displayedto a user.
展开▼