Division of Spanish Words into Morphemes with a Genetic Algorithm

机译：用遗传算法将西班牙语分成语素

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We discuss an unsupervised technique for determining morpheme structure of words in an inflective language, with Spanish as a case study. For this, we use a global optimization (implemented with a genetic algorithm), while most of the previous works are based on heuristics calculated using conditional probabilities of word parts. Thus, we deal with complete space of solutions and do not reduce it with the risk to eliminate some correct solutions beforehand. Also, we are working at the derivative level as contrasted with the more traditional grammatical level interested only in flexions. The algorithm works as follows. The input data is a wordlist built on the base of a large dictionary or corpus in the given language and the output data is the same wordlist with each word divided into morphemes. First, we build a redundant list of all strings that might possibly be prefixes, suffixes, and stems of me words in the wordlist. Then, we detect possible paradigms in this set and filter out all items from the lists of possible prefixes and suffixes (though not stems) that do not participate in such paradigms. Finally, a subset of those lists of possible prefixes, stems, and suffixes is chosen using the genetic algorithm. The fitness function is based on the ideas of minimum length description, i.e. we choose the minimum number of elements that are necessary for covering all the words. The obtained subset is used for dividing the words from the wordlist. Algorithm parameters are presented. Preliminary evaluation of the experimental results for a dictionary of Spanish is given.

机译：我们讨论了一种无监督的技术，用于在案例研究中用西班牙语确定替补语言中单词的语素结构。为此，我们使用全局优化（用遗传算法实现），而大多数以前的作品基于使用Word部件的条件概率计算的启发式。因此，我们处理完整的解决方案空间，并且不会将其降低，以便预先消除一些正确的解决方案。此外，我们正在衍生水平工作，与屈曲中兴趣的更传统的语法级别形成鲜明对比。该算法如下工作。输入数据是在给定语言的大字典或语料库的基础上构建的字列表，输出数据是与每个单词分为语素的单词列表。首先，我们构建一个可能是WordList中可能是前缀，后缀和茎的所有字符串的冗余列表。然后，我们检测到该集合中可能的范例，并从可能不参与此类范例的可能前缀和后缀（尽管不是茎）的列表中过滤掉所有项目。最后，选择使用遗传算法选择可能前缀，茎和后缀的那些列表的子集。健身功能基于最小长度描述的思想，即，我们选择覆盖所有单词所需的最小元素数。所获得的子集用于将单词从字列表中划分。呈现算法参数。给出了西班牙文字典实验结果的初步评价。

著录项

来源
《International Conference on Applications of Natural Language to Information Systems》|2008年||共8页
会议地点
作者
Alexander Gelbukh; Grigori Sidorov; Diego Lara-Reyes; Liliana Chanona-Hernandez;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词

相似文献

外文文献
中文文献
专利

1. Hybrid algorithm for the classification of prostate cancer patients of the MCC-Spain study based on support vector machines and genetic algorithms [J] . Sanchez Lasheras Juan Enrique, Sanchez Lasheras Fernando, Gonzalez Donquiles Carmen, Neurocomputing . 2021,第Sepa10期

机译：基于支持向量机和遗传算法的MCC-Spain研究中前列腺癌患者杂交算法
2. Automatic selection of lexical features for detecting Alzheimer's disease using bag-of-words model and genetic algorithm [J] . Gang Lyu, Aimei Dong International Journal of Computer Applications in Technology . 2019,第4期

机译：用词袋模型和遗传算法自动选择检测阿尔茨海默病的词汇特征
3. EMdeCODE: a novel algorithm capable of reading words of epigenetic code to predict enhancers and retroviral integration sites and to identify H3R2me1 as a distinctive mark of coding versus non-coding genes [J] . Federico Andrea Santoni Nucleic acids research . 2013,第3期

机译：EMdeCODE：一种新颖的算法，能够读取表观遗传代码的单词，以预测增强子和逆转录病毒整合位点，并将H3R2me1识别为编码与非编码基因的显着标记
4. Division of Spanish Words into Morphemes with a Genetic Algorithm [C] . Alexander Gelbukh, Grigori Sidorov, Diego Lara-Reyes, Natural Language Processing and Information Systems . 2008

机译：遗传算法将西班牙语单词分解为词素
5. Inventory simulation and optimization using system dynamics, structural modeling equations and genetic algorithms in the drivetrain division of an automotive manufacturer. [D] . Sisfontes-Monge, Marco. 2005

机译：在汽车制造商的动力总成部门使用系统动力学，结构建模方程和遗传算法进行库存仿真和优化。
6. EMdeCODE: a novel algorithm capable of reading words of epigenetic code to predict enhancers and retroviral integration sites and to identify H3R2me1 as a distinctive mark of coding versus non-coding genes [O] . Federico Andrea Santoni 2013

机译：EMdeCODE：一种新颖的算法能够读取表观遗传代码的单词以预测增强子和逆转录病毒整合位点并将H3R2me1识别为编码与非编码基因的显着标记
7. Division of Spanish Words into Morphemes with a Genetic Algorithm∗ [O] . Er Gelbukh, Grigori Sidorov, Diego Lara-reyes, 2013

机译：用遗传算法将西班牙语单词划分为语素*
8. Automatic Word Categorization with Genetic Algorithms [R] . Lankhorst, M. M. 1994

机译：基于遗传算法的自动词分类

Division of Spanish Words into Morphemes with a Genetic Algorithm

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅