PROBLEM TO BE SOLVED: To solve a problem that duplicate information can not be detected if fluctuation or the like in description is included in information registered in a database.;SOLUTION: A similarity calculation part 3 calculates similarity between records read from the database 2 by use of a conversion word dictionary 5 in which synonyms and omissible words are registered. The conversion word dictionary 5 is composed of a synonym dictionary and an omissible word dictionary. In the synonym dictionary, a representative word as an another word synonymous with a certain word, and in the omissible word dictionary, mutually omissible words are registered. A duplicate candidate extraction part 6 extracts, as a duplicate record candidate, a pair of records whose similarity is a predetermined threshold or above.;COPYRIGHT: (C)2006,JPO&NCIPI
展开▼