首页>
外国专利>
METHODS FOR OBTAINING IMPROVED TEXT SIMILARITY MEASURES
METHODS FOR OBTAINING IMPROVED TEXT SIMILARITY MEASURES
展开▼
机译:获得改进的文本相似性度量的方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
The embodiments of the invention provide methods for obtaining improved text similarity measures. More specifically, a method of measuring similarity between at least two electronic documents begins by identifying similar terms between the electronic documents. This includes basing similarity between the similar terms on patterns, wherein the patterns can include word patterns, letter patterns, numeric patterns, and/or alphanumeric patterns. The identifying of the similar terms also includes identifying multiple pattern types between the electronic documents. Moreover, the basing of the similarity on patterns identifies terms within the electronic documents that are within a category of a hierarchy. Specifically, the identifying of the terms reviews a hierarchical data tree, wherein nodes of the tree represent terms within the electronic documents. Lower nodes of the tree have specific terms; and, wherein higher nodes of the tree have general terms.
展开▼