首页> 外文期刊>Journal of Theoretical and Applied Information Technology >DESIGN AND DEVELOPMENT OF DICTIONARY-BASED STEMMER FOR THE URDU LANGUAGE
【24h】

DESIGN AND DEVELOPMENT OF DICTIONARY-BASED STEMMER FOR THE URDU LANGUAGE

机译:基于字典的词条语言的设计与开发乌尔都语语言

获取原文
           

摘要

Stemming reduces numerous variant forms of a word to its base, stem or root form which is essential for different language processing applications including Urdu IR. Urdu is a resource poor and morphologically rich language. Multilingual Urdu vocabulary is very challenging to process due to its complex morphology. Research of Urdu stemming has an age of a decade. However, there has not been any work reported on dictionary based Urdu stemming. The present work introduces a dictionary based Urdu stemmer with improved performance as compared to the existing Urdu stemmers. The significance of the study is the identification of dictionary-based approach for Urdu stemming as the most promising approach, especially with dictionary update feature. Testing shows 94.85% overall accuracy on test data and results can be further improved by cleaning test data and dictionary updates.
机译:Stemming将单词的许多变体形式降低到其基础,茎或根形式,这对于包括URDU IR的不同语言处理应用是必不可少的。乌尔都语是一种贫困和形态丰富的语言。由于其复杂的形态,多语种乌尔都语词汇表非常具有挑战性。对乌尔都语的研究有十年的年龄。但是,没有任何作品报告关于基于词典的乌尔都语肿胀。目前的工作介绍了基于词典的乌尔多兹威尔,与现有的Urdu Sembers相比具有改进的性能。该研究的重要性是识别乌尔都语所谓的乌尔都语方法的识别,特别是最有前途的方法,特别是在字典更新功能。测试显示了94.85%的测试数据的总体精度和通过清洁测试数据和字典更新可以进一步提高结果。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号