首页> 外国专利> Based on paradigms morphological text analysis for natural languages.

Based on paradigms morphological text analysis for natural languages.

机译：基于范式的自然语言形态文本分析。

页面导航

摘要
著录项
相似文献

摘要

A computer method is disclosed for analyzing text by employing a model known as a paradigm, that provides all the inflectional forms of a word. A file structure is created consisting of two components, a list of words (a dictionary), each word of which is associated with a set of paradigm references, and the file of paradigms consisting of grammatical categories paired with their corresponding ending or affix portions (known as the desinence) specifying tense, mood, number, gender or other linguistic attribute. A computer method is disclosed for generating the file structure of the dictionary by generating all forms of the words from a list of standard forms of the words (known as the lemma) which is generally the infinitive of a verb of the singular form of a noun, the lemmas being generated with their corresponding paradigms. The method sorts and organizes the resulting word list into a dictionary. An input data stream of natural language words can then be processed by generating a lemma for each input word. The specific grammatical form of an input word can be generated from the standard form of the word (the lemma) and the grammatical category, by matching the lemma against the dictionary and using its paradigm references to access a set of paradigms. Then the desinences of the paradigms are matched against the lemma and the desinence corresponding to the specified grammatical category is selected. The specific grammatical form is generated by replacing the desinence of the lemma with the desinence of the desired grammatical form.

机译：公开了一种计算机方法，其通过采用称为范例的模型来分析文本，该模型提供单词的所有变形形式。创建的文件结构由两个部分组成，一个单词列表（一个字典），每个单词与一组范例引用相关联，而范例文件则由语法类别及其对应的结尾或词缀部分组成（指定时态，情绪，数字，性别或其他语言属性。公开了一种计算机方法，该方法用于通过从词的标准形式（称为引理）的列表中生成词的所有形式来生成字典的文件结构，这通常是名词的单数形式的动词的不定式，生成引理及其相应的范式。该方法将得到的单词列表排序并组织到字典中。然后可以通过为每个输入单词生成引理来处理自然语言单词的输入数据流。输入词的特定语法形式可以通过单词的标准形式（引理）和语法类别来生成，方法是将引理与字典进行匹配，并使用其范式引用访问一组范式。然后将范式的设计与引理进行匹配，并选择与指定语法类别相对应的设计。通过用期望的语法形式的需求代替引理的需求来生成特定的语法形式。

著录项

公开/公告号DE3853894T2

专利类型
公开/公告日1995-12-14

原文格式PDF
申请/专利权人 IBM US;
展开▼

申请/专利号DE19883853894T
发明设计人 ZAMORA ANTONIO US;
展开▼

申请日1988-02-05
分类号G06F17/20;G06F17/27;
国家 DE
入库时间 2022-08-22 03:42:37

相似文献

专利
外文文献
中文文献