1. The computer-based method of comparison character strings, a character string candidate with a plurality of records of character strings stored in the database, said method comprising: a) identification of a set of reference character strings in the database, reference character strings are identified using an optimized set of search character strings dissimilar; ! b) generating a representation of n-grams of a reference character string in a set of reference character strings; ! c) generating a representation of n-grams of the candidate character string; ! d) determining the similarity between the n-gram representations; ! e) repeating steps b) and d) for the remaining reference character strings in the set identified by the reference character strings; and! f) indexing the character string candidate database, based on the determination of relevance between the presentation of n-grams of the candidate character string and the reference character strings in the identified set. ! 2. The computer-based method of claim 1, characterized in that the similarity determination between the n-gram representations comprises: calculating a two-dimensional vector containing the frequency of all of the unique n-grams in the candidate character string and the frequency of occurrence of unique n-grams in the reference character string; and! calculating a similarity metric for the candidate character string with respect to the reference character strings, based on the two-dimensional vector. ! 3. The computer-based method according to claim 2, characterized in that the calculation of similarity metrics for the candidate character string includes the use of Structured Query Language computation for comparing the contents of a two-dimensional
展开▼