首页> 外文期刊>Journal of information and computational science >Automatic Part-of-speech Tagging for Oromo Language Using Maximum Entropy Markov Model (MEMM)
【24h】

Automatic Part-of-speech Tagging for Oromo Language Using Maximum Entropy Markov Model (MEMM)

机译:使用最大熵马尔可夫模型(MEMM)的Oromo语言的自动词性标记

获取原文
获取原文并翻译 | 示例

摘要

The problem of Part-of-speech tagging (POS tagging) for natural language processing task or computational linguistics is inevitable for every natural language of mankind. In this paper, we present experimental results on one of the state-of-the-art probabilistic model for sequence classification, Maximum Entropy Markov Model (MEMM), for tagging Oromo language. This model assigns the correct part-of-speech tag to each word or token of the sentence, considering many features and contexts. We used a MEMM and it was found to be the best way to estimate word classes of Oromo text. To implement the model, experiments were conducted on a manually annotated corpus of 452 sentences (total of 6094 words) of Oromo language. Experimental results show that the new algorithm performs well with accuracy of 93.01% evaluated by tenfold cross validation. By the result of this paper it can be generalized that this modelling technique, MEMM, has shown some advantages over Hidden Markov Models for sequence tagging since it offers increased freedom in choosing features to represent observations for POS tagging of oromo language.
机译:对于自然语言处理任务或计算语言学而言,词性标注(POS标注)问题对于人类的每一种自然语言都是不可避免的。在本文中,我们介绍了一种用于序列分类的最新概率模型,用于标记Oromo语言的最大熵马尔可夫模型(MEMM)的实验结果。考虑到许多功能和上下文,该模型将正确的词性标签分配给句子的每个单词或标记。我们使用了MEMM,发现它是估计Oromo文本单词类别的最佳方法。为了实现该模型,对Oromo语言的452个句子(总共6094个单词)的人工注释语料库进行了实验。实验结果表明,该算法性能良好,经十倍交叉验证,准确率为93.01%。通过本文的结果,可以概括出,该建模技术MEMM在序列标记方面表现出了优于隐马尔可夫模型的优势,因为它提供了更多的自由度来选择表示Oromo语言的POS标记的特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号