首页> 外文会议>International conference on Asian language processing >A rule and statistical modeling based stem extraction method for kazakh words
【24h】

A rule and statistical modeling based stem extraction method for kazakh words

机译:基于规则和统计建模的哈萨克语词干提取方法

获取原文

摘要

The Kazakh is one of the agglutinative language with more complicated morphological changes. Kazak stem and affix extraction have important significance for Kazakh information processing. In this paper, according to the morphological structure of Kazakh words, we applied a method to stem extraction, which is combined the lexical rules with statistical model. The stem extraction is carried out by using prefix dictionary, suffix dictionary, stem dictionary, statistical model dictionary and the rule base. Experimental results show that, in the statistical model, the method to extract the stem by using part of speech features is effective, in that, the word level accuracy and the stem level accuracy of this method reached 0.93% and 76.74% respectively.
机译:哈萨克语是一种形态较为复杂的凝集性语言。哈萨克斯坦词干和词缀的提取对于哈萨克斯坦的信息处理具有重要意义。本文根据哈萨克语的词素结构,将词法规则与统计模型相结合,应用词干提取方法。词干提取通过使用前缀字典,后缀字典,词干字典,统计模型字典和规则库来进行。实验结果表明,在统计模型中,利用部分语音特征提取词干的方法是有效的,该方法的单词水平准确度和词干水平准确度分别达到0.93 \%和76.74 \%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号