首页> 外文会议>International conference on computational linguistics >A Light Weight Stemmer for Urdu Language: A Scarce Resourced Language
【24h】

A Light Weight Stemmer for Urdu Language: A Scarce Resourced Language

机译:用于乌尔都语语言的轻重款:稀缺资源的语言

获取原文

摘要

Stemming is a procedure that conflates morphologically related terms into a single term without doing complete morphological analysis. Urdu language raises several challenges to Natural Language Processing (NLP) largely due to its rich morphology. The core tool of information retrieval (IR) is a Stemmer which reduces a word to its stem form. Due to the diverse nature of Urdu, developing its stemmer for an IR system is a challenging task. This paper presents a light weight stemmer for Urdu text, which uses rule based approach. Exceptional lists are developed to enhance the accuracy of the stemmer. The result of the stemmer is quite enough and can be effective in IR system.
机译:Stemming是一种程序,其将形态学相关术语与单个术语混合,而不进行完全的形态学分析。由于其丰富的形态,乌尔都语语言对自然语言处理(NLP)提出了几项挑战。信息检索(IR)的核心工具是一个茎机,它将单词减少到其茎形式。由于Urdu的不同性质,为IR系统开发其Sewmer是一个具有挑战性的任务。本文介绍了乌尔都语文本的轻质终结器,它使用基于规则的方法。开发出卓越的清单以增强Sefemer的准确性。 Sewer的结果是足够的并且可以在IR系统中有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号