Identification of words with the same base meaning is a necessary procedure for many algorithms of computational linguistics and text processing. We propose to use for this a knowledge-poor approach using an empirical formula based on the number of the coincident letters in the initial parts of the two words and the number of non-coincident letters in the final parts of these two words. To construct such a formula for a given language, we use inductive method of self-organization developed by A. Ivahnenko. This method considers a set of models (formulas) of a given class and selects the best ones using training samples and test samples. We give a detailed example for English. We also show how to apply the formula for creating word frequency list.
展开▼