首页> 外文期刊>Computer Science & Information Technology >Improving Rule-Based Method for Arabic POS Tagging Using HMM Technique
【24h】

Improving Rule-Based Method for Arabic POS Tagging Using HMM Technique

机译:使用HMM技术改进基于规则的阿拉伯语POS标记方法

获取原文
           

摘要

Part-of-speech (POS) tagger plays an important role in Natural Language Applications like Speech Recognition, Natural Language Parsing, Information Retrieval and Multi Words Term Extraction. This study proposes a building of an efficient and accurate POS Tagging technique for A rabic language using statistical approach. Arabic Rule-Based method suffers from misclassified and unanalyzed words due to the ambiguity issue. To overcome these two problems, we propose a Hidden Markov Model (HMM) integrated with Arabic Rule-Based method. Our POS tagger generates a set of 4 POS tags: Noun, Verb, Particle, and Quranic Initial (INL). The proposed technique uses the different contextual information of the words with a variety of the features which are helpful to predict the various POS classes. To evaluate its accuracy, the proposed method has been trained and tested with the Holy Quran Corpus containing 77 430 terms for undiacritized Classical Arabic language. The experiment results demonstrate the efficiency of our method for Arabic POS Tagging. The obtained accuracies are 97.6% and 94.4% for respectively our method and for the Rule based tagger method
机译:词性(POS)标记器在自然语言应用(例如语音识别,自然语言解析,信息检索和多词术语提取)中发挥重要作用。这项研究提出了一种使用统计方法针对阿拉伯语的高效,准确的POS标记技术的构建。由于含糊不清的问题,基于阿拉伯规则的方法存在单词分类错误和未分析的问题。为了克服这两个问题,我们提出了一种与阿拉伯基于规则的方法相集成的隐马尔可夫模型(HMM)。我们的POS标记器会生成4个POS标记集:名词,动词,质点和古兰经首字母(INL)。所提出的技术使用具有各种特征的单词的不同上下文信息,这些特征有助于预测各种POS类。为了评估其准确性,该方法已经过圣古兰经语料库的培训和测试,该古兰经语料库包含77 430个不偏音的古典阿拉伯语术语。实验结果证明了我们的阿拉伯POS标记方法的有效性。对于我们的方法和基于规则的标记器方法,获得的准确性分别为97.6%和94.4%

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号