首页>
外国专利>
Scalable approach to information-theoretic string similarity using a guaranteed rank threshold
Scalable approach to information-theoretic string similarity using a guaranteed rank threshold
展开▼
机译:使用保证等级阈值的信息理论字符串相似性的可扩展方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
A string analysis tool for calculating a similarity metric between an input string and a plurality of strings in a collection to be searched. The string analysis tool may include optimizations that may reduce the number of calculations to be carried out when calculating the similarity metric for large volumes of data. In this regard, the string analysis tool may represent strings as features. As such, analysis may be performed relative to features (e.g., of either the input string or plurality of strings to be searched) such that features from the strings may be eliminated from consideration when identifying candidate strings from the collection for which a similarity metric is to be calculated. The elimination of features may be based on a minimum similarity metric threshold, wherein features that are incapable of contributing to a similarity metric above the minimum similarity metric threshold are eliminated from consideration.
展开▼