DESIGN AND DEVELOPMENT OF DICTIONARY-BASED STEMMER FOR THE URDU LANGUAGE

ZAHID HUSSAIN; SAJID IQBAL; TANZILA SABA; ABDULAZIZ S. ALMAZYAD; AMJAD REHMAN

首页> 外文期刊>Journal of Theoretical and Applied Information Technology >DESIGN AND DEVELOPMENT OF DICTIONARY-BASED STEMMER FOR THE URDU LANGUAGE

【24h】

DESIGN AND DEVELOPMENT OF DICTIONARY-BASED STEMMER FOR THE URDU LANGUAGE

机译：基于字典的词条语言的设计与开发乌尔都语语言

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

Stemming reduces numerous variant forms of a word to its base, stem or root form which is essential for different language processing applications including Urdu IR. Urdu is a resource poor and morphologically rich language. Multilingual Urdu vocabulary is very challenging to process due to its complex morphology. Research of Urdu stemming has an age of a decade. However, there has not been any work reported on dictionary based Urdu stemming. The present work introduces a dictionary based Urdu stemmer with improved performance as compared to the existing Urdu stemmers. The significance of the study is the identification of dictionary-based approach for Urdu stemming as the most promising approach, especially with dictionary update feature. Testing shows 94.85% overall accuracy on test data and results can be further improved by cleaning test data and dictionary updates.

机译：Stemming将单词的许多变体形式降低到其基础，茎或根形式，这对于包括URDU IR的不同语言处理应用是必不可少的。乌尔都语是一种贫困和形态丰富的语言。由于其复杂的形态，多语种乌尔都语词汇表非常具有挑战性。对乌尔都语的研究有十年的年龄。但是，没有任何作品报告关于基于词典的乌尔都语肿胀。目前的工作介绍了基于词典的乌尔多兹威尔，与现有的Urdu Sembers相比具有改进的性能。该研究的重要性是识别乌尔都语所谓的乌尔都语方法的识别，特别是最有前途的方法，特别是在字典更新功能。测试显示了94.85％的测试数据的总体精度和通过清洁测试数据和字典更新可以进一步提高结果。

著录项

来源
《Journal of Theoretical and Applied Information Technology》 |2017年第15期|共1页
作者
ZAHID HUSSAIN; SAJID IQBAL; TANZILA SABA; ABDULAZIZ S. ALMAZYAD; AMJAD REHMAN;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

DESIGN AND DEVELOPMENT OF DICTIONARY-BASED STEMMER FOR THE URDU LANGUAGE

摘要

著录项

相关主题

期刊订阅