首页> 外文会议>International conference on intelligent text processing and computational linguistics >A Hybrid Approach to the Development of Part-of-Speech Tagger for Kafi-noonoo Text
【24h】

A Hybrid Approach to the Development of Part-of-Speech Tagger for Kafi-noonoo Text

机译:Kafi-noonoo文本词性标注器开发的一种混合方法

获取原文

摘要

Although natural language processing (NLP) is now a popular area of research and development, less-resourced languages are not receiving much attention from developers. One of such under-resourced languages is Kafi-noonoo which is spoken in the south-western regions of Ethiopia. This paper presents the development of part-of-speech tagger for Kafi-noonoo. In order to develop the tagger, we employed a hybrid of two systems: statistical and rule-based taggers. The lexical and transitional probabilities of word classes are modeled using HMM. However, due to the limitation of corpus for the language, a set of transformation rules are applied to improve the result. The system was tested with test corpus and, with 90% of the corpus used for training, the hybrid tagger yielded an accuracy of 80.47%.
机译:尽管自然语言处理(NLP)现在已成为研究和开发的热门领域,但是资源较少的语言并未受到开发人员的太多关注。这种资源不足的语言之一是在埃塞俄比亚西南地区使用的Kafi-noonoo。本文介绍了用于Kafi-noonoo的词性标记器的开发。为了开发标记器,我们采用了两种系统的混合体:统计标记器和基于规则的标记器。使用HMM对单词类的词汇和过渡概率进行建模。但是,由于语料库对语言的限制,因此应用了一组转换规则来改善结果。该系统使用测试语料库进行了测试,并且使用90%的语料库进行训练,混合标记器的准确度为80.47%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号