首页> 外文会议>2017 1st International Conference on Informatics and Computational Sciences >Morphology analysis for Hidden Markov Model based Indonesian part-of-speech tagger
【24h】

Morphology analysis for Hidden Markov Model based Indonesian part-of-speech tagger

机译:基于隐式马尔可夫模型的印尼语词性标注器的形态学分析

获取原文
获取原文并翻译 | 示例

摘要

Part-of-Speech (POS) tagging plays an important role in Natural Language Processing (NLP). It classifies a word into its tags, such as noun, verb, and pronoun. Many POS tagging approaches have been developed to solve manual POS tagging which is a time-consuming task. Hidden Markov Model (HMM) is a statistical-based method which widely used for POS tagging. In Indonesian language, HMM has been improved with affix tree method which handles Out-of-Vocabulary (OOV) words problem and affixation. The problem is affix tree does not provide any information to handle the clitics. Therefore, this study proposes morphology analysis for Indonesian Part-of-Speech (POS) Tagging. We combine MorphInd as morphology analyzer and HMM to improve the performance of POS tagging. In the experiment, there are 10,000 tokens for training and 3,000 tokens for testing. We prepare three different testing corpus; each consists of 10%, 20%, and 30% OOV words. The experimental results show that the proposed method achieves better performance compared to other methods.
机译:词性(POS)标记在自然语言处理(NLP)中扮演重要角色。它将单词分类到其标签中,例如名词,动词和代词。已经开发了许多POS标记方法来解决手动POS标记这一耗时的任务。隐马尔可夫模型(HMM)是一种基于统计的方法,广泛用于POS标记。在印度尼西亚语中,HMM已通过词缀树方法进行了改进,该方法可处理词汇外(OOV)单词问题和词缀。问题是词缀树没有提供任何信息来处理气候。因此,本研究提出了针对印尼语词性(POS)标记的形态学分析。我们结合使用MorphInd作为形态分析仪和HMM来提高POS标记的性能。在实验中,有10,000个令牌用于训练,而3,000个令牌用于测试。我们准备了三种不同的测试语料库;每个由10 \\%,20 \\%和30 \\%OOV字组成。实验结果表明,与其他方法相比,该方法具有更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号