首页> 外文会议>International Conference on Futuristic Trends on Computational Analysis and Knowledge Management >Design Development of Rule Based Inflectional and Derivational Urdu Stemmer 'Usal'
【24h】

Design Development of Rule Based Inflectional and Derivational Urdu Stemmer 'Usal'

机译:基于规则的折射和衍生乌尔德王某的设计与发展Sefalmer' USAL'

获取原文
获取外文期刊封面目录资料

摘要

Urdu is a morphologically rich language that means Urdu words having different variant form of words. In Natural Language Processing, morphology plays an important role. Morphology means study of word structure. In this paper, we focused on Urdu language and developed inflectional and derivational rule based Urdu stemmer. Stemming is a branch of morphology. In general, we can say that Stemming is a process of extracting 'root' word from its actual word and separate the affixes. Through this simple rule based stemming algorithm, raised the problem of under-stemming and over-stemming. To reduce the problem of under-stemming, we have used longest suffix stripping algorithm and to reduce the problem of over-stemming, we have created database of exception words and stop-words.
机译:乌尔都语是一种形态学丰富的语言,意思是具有不同变体形式的单词的乌尔都语。在自然语言处理中,形态学发挥着重要作用。形态学意味着词结构的研究。在本文中,我们专注于Urdu语言,并开发了基于伯尔特和衍生规则的乌尔都语。 Stemming是形态学的分支。一般来说,我们可以说源是从实际词中提取“根”单词并分开附件的过程。通过这种简单的基于规则的溶解算法,提出了鼻底和过度置出的问题。为了减少迟交的问题,我们使用了最长的后缀剥离算法并减少过度置出的问题,我们创建了例外单词和停止词的数据库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号