首页> 外文期刊>International Journal of Intelligent Enterprise >Training and evaluation of TreeTagger on Amazigh corpus
【24h】

Training and evaluation of TreeTagger on Amazigh corpus

机译:Amazigh Corpus对特雷格格格格的培训和评价

获取原文
获取原文并翻译 | 示例
           

摘要

Part-of-speech (POS) tagging has high importance in the domain of natural language processing (NLP). POS tagging determines grammatical category to any token, such as noun, verb, adjective, person, gender, etc. Some of the words are ambiguous in their categories and what tagging does is to clear of ambiguous word according to their context. Many taggers are designed with different approaches to reach high accuracy. In this paper we present a Machine Learning algorithm, which combines decision trees model and HMM model to tag Amazigh unknown words. In case of statistical methods such as TreeTagger, this will have added practical advantages also. This paper presents creation of a POS tagged corpus and evaluation of TreeTagger on Amazigh text. The results of experiments on Amazigh text show that TreeTagger provides overall tagging accuracy of 93.19%, specifically, 94.10% on known words and 70.29% on unknown words.
机译:语音部分(POS)标记在自然语言处理域(NLP)中具有很高的重要性。 POS标记将语法类别确定为任何令牌,例如名词,动词,形容词,人,性别等。一些单词在其类别中含糊不清,标记确实是根据他们的上下文清除暧昧的单词。 许多标签设计采用不同的方法来达到高精度。 在本文中,我们提出了一种机器学习算法,它将决策树模型和嗯模型组合以标记Amazigh未知词。 在特雷格格拉格等统计方法的情况下,这也将增加实际优势。 本文提出了创建POS标记的语料库和在Amazigh文本上的Treetagger评估。 Amazigh文本实验结果表明,特雷格码头提供93.19%的整体标记精度,特别是已知单词的94.10%,未知单词的70.29%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号