A computer method is disclosed for analyzing text by employing a model known as a paradigm, that provides all the inflectional forms of a word. A file structure is created consisting of two components, a list of words (a dictionary), each word of which is associated with a set of paradigm references, and the file of paradigms consisting of grammatical categories paired with their corresponding ending or affix portions (known as the desinence) specifying tense, mood, number, gender or other linguistic attribute. A computer method is disclosed for generating the file structure of the dictionary by generating all forms of the words from a list of standard forms of the words (known as the lemma) which is generally the infinitive of a verb of the singular form of a noun, the lemmas being generated with their corresponding paradigms. The method sorts and organizes the resulting word list into a dictionary. An input data stream of natural language words can then be processed by generating a lemma for each input word. The specific grammatical form of an input word can be generated from the standard form of the word (the lemma) and the grammatical category, by matching the lemma against the dictionary and using its paradigm references to access a set of paradigms. Then the desinences of the paradigms are matched against the lemma and the desinence corresponding to the specified grammatical category is selected. The specific grammatical form is generated by replacing the desinence of the lemma with the desinence of the desired grammatical form.
展开▼