首页> 外文会议>International Conference on Informatics and Computational Sciences >Morphology analysis for Hidden Markov Model based Indonesian part-of-speech tagger
【24h】

Morphology analysis for Hidden Markov Model based Indonesian part-of-speech tagger

机译:基于Markov模型的印度尼西亚术语标签的形态分析

获取原文

摘要

Part-of-Speech (POS) tagging plays an important role in Natural Language Processing (NLP). It classifies a word into its tags, such as noun, verb, and pronoun. Many POS tagging approaches have been developed to solve manual POS tagging which is a time-consuming task. Hidden Markov Model (HMM) is a statistical-based method which widely used for POS tagging. In Indonesian language, HMM has been improved with affix tree method which handles Out-of-Vocabulary (OOV) words problem and affixation. The problem is affix tree does not provide any information to handle the clitics. Therefore, this study proposes morphology analysis for Indonesian Part-of-Speech (POS) Tagging. We combine MorphInd as morphology analyzer and HMM to improve the performance of POS tagging. In the experiment, there are 10,000 tokens for training and 3,000 tokens for testing. We prepare three different testing corpus; each consists of 10%, 20%, and 30% OOV words. The experimental results show that the proposed method achieves better performance compared to other methods.
机译:词语部分(POS)标记在自然语言处理中发挥着重要作用(NLP)。它将一个单词分类为其标签,例如名词,动词和代词。已经开发出许多POS标记方法来解决手动POS标记,这是一个耗时的任务。隐藏的马尔可夫模型(HMM)是一种基于统计的方法,广泛用于POS标记。在印度尼西亚语言中,HMM已经改进了与粘合树方法的改进,该方法处理词汇流(OOV)单词问题和附加。问题是附加树不提供处理界面的任何信息。因此,本研究提出了印度尼西亚语言(POS)标记的形态分析。我们将Morphind与形态分析仪结合在一起,嗯,提高POS标记的性能。在实验中,有10,000个令牌进行训练和3,000个令牌进行测试。我们准备三种不同的测试语料库;每个都包含10 %,20 %和30 %OOV字。实验结果表明,与其他方法相比,该方法达到了更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号