首页> 外文会议>International Conference on Technologies and Applications of Artificial Intelligence >Probabilistic Segmentation of Word Forms into Affixes and Word Roots
【24h】

Probabilistic Segmentation of Word Forms into Affixes and Word Roots

机译:词形式的概率分割成缀合和词根

获取原文

摘要

This paper introduces a method for segmenting a given word into word parts, including affixes, word stem, and word roots. In our approach, word parts including affixes and word roots in a given training dataset are counted and relevant probability values estimated. The method involves training a probabilistic model on a set of annotated word segmentation, finding most probable word stem and affixes, and finally further segment word stem into word roots. At run-time, we first strip the affixes off the given word to derive the stem. Then we segment the stem word into word roots. We enumerate all possible segmentation, and the most probable segmentation is then returned as the best morphological segmentation of the given word. Moreover, we adjust our probabilistic model by considering the rules for adding suffixes to word roots and the positions of prefixes and suffixes in a word. Preliminary evaluation shows that the proposed method is competitive with previous works.
机译:本文介绍了一种将给定单词分割成单词零件的方法,包括附件,单词茎和字根。在我们的方法中,计算包括给定训练数据集中的附件和字根的单词部分,并且估计相关的概率值。该方法涉及在一组注释的单词分割上训练概率模型,找到最可能的单词茎和附件,最后将另外的段字根置于字根中。在运行时,我们首先剥离给定的单词的附件来导出茎。然后我们将茎字分割成字根。我们枚举所有可能的分割,然后将最可能的分割作为给定词的最佳形态分割返回。此外,我们通过考虑将后缀添加到单词中的字根和前缀和后缀的位置来调整我们的概率模型。初步评估表明,该方法与以前的作品具有竞争力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号