首页> 外国专利> METHOD AND SYSTEM FOR IMPLEMENTING APPROXIMATE COMPARISON row in the database

METHOD AND SYSTEM FOR IMPLEMENTING APPROXIMATE COMPARISON row in the database

机译:在数据库中实现近似比较行的方法和系统

摘要

1. The computer-based method of comparison character strings, a character string candidate with a plurality of records of character strings stored in the database, said method comprising: a) identification of a set of reference character strings in the database, reference character strings are identified using an optimized set of search character strings dissimilar; ! b) generating a representation of n-grams of a reference character string in a set of reference character strings; ! c) generating a representation of n-grams of the candidate character string; ! d) determining the similarity between the n-gram representations; ! e) repeating steps b) and d) for the remaining reference character strings in the set identified by the reference character strings; and! f) indexing the character string candidate database, based on the determination of relevance between the presentation of n-grams of the candidate character string and the reference character strings in the identified set. ! 2. The computer-based method of claim 1, characterized in that the similarity determination between the n-gram representations comprises: calculating a two-dimensional vector containing the frequency of all of the unique n-grams in the candidate character string and the frequency of occurrence of unique n-grams in the reference character string; and! calculating a similarity metric for the candidate character string with respect to the reference character strings, based on the two-dimensional vector. ! 3. The computer-based method according to claim 2, characterized in that the calculation of similarity metrics for the candidate character string includes the use of Structured Query Language computation for comparing the contents of a two-dimensional
机译:1.一种基于计算机的比较字符串的方法,一种候选字符串,其具有存储在数据库中的多个字符串记录,所述方法包括:a)标识数据库中一组参考字符串,参考字符串使用一组不同的优化搜索字符串来识别; ! b)生成一组参考字符串中的参考字符串的n元语法表示; ! c)生成候选字符串的n-gram表示; ! d)确定n元语法表示之间的相似性; ! e)对参考字符串标识的集合中的其余参考字符串重复步骤b)和d);和! f)基于确定候选字符串的n元语法与所识别的集合中的参考字符串之间的相关性,对候选字符串数据库建立索引。 ! 2.根据权利要求1所述的基于计算机的方法,其特征在于,所述n元语法表示之间的相似度确定包括:计算包含所述候选字符串中所有所述唯一n元语法的频率和所述频率的二维矢量。在参考字符串中出现唯一的n-gram的情况;和!基于二维矢量,计算候选字符串相对于参考字符串的相似度。 ! 3.根据权利要求2所述的基于计算机的方法,其特征在于,所述候选字符串的相似性度量的计算包括使用结构化查询语言计算来比较二维内容

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号