【24h】

A STATISTICAL BASED PART OF SPEECH TAGGER FOR URDU LANGUAGE

机译:基于统计的URDU语言标记的一部分

获取原文

摘要

In this paper we present a pioneering step in designing n-gram based part of speech tagger for the Urdu language.In the last few years part of speech tagging work has been done in the area of supposed English, South Asian and European languages.In this paper our focus of attention is on the disambiguation problem (to assign the accurate tag for every word of a set of possible tags).Our approach employs n-gram Markov Model, train from annotated Urdu corpus and assigns possible tags to text.The proposed n-gram part of speech tagger has been tested which achieved state of the art performance of 95.0%.Furthermore, we check our experiment results of two type of tagset.Along the way, we apply evaluation method that shows how significant our experiment results are.Besides, we present the error analysis (Confusion Matrix) and show the tagging example of Urdu tagging.We also present overview of Urdu language.The contribution of our work is an initial step of statistical based Urdu part of speech tagger.
机译:在本文中,我们提出了在为乌尔都语语言设计基于n-gram的语音标记器方面的开拓性步骤。在过去的几年中,语音标记的一部分工作已在假定的英语,南亚和欧洲语言领域进行。本文的重点是消除歧义问题(为一组可能的标签中的每个单词分配准确的标签)。我们的方法采用n-gram马尔可夫模型,从带注释的Urdu语料库中训练并将可能的标签分配给文本。我们测试了建议的n-gram语音标记器,其性能达到了95.0%的最新水平。此外,我们检查了两种类型标记集的实验结果,并沿用了评估方法,该方法表明了我们的实验结果有多么重要此外,我们介绍了错误分析(混淆矩阵)并显示了Urdu标记的标记示例。我们还介绍了Urdu语言的概述。我们的工作是基于统计的speec基于Urdu的第一步。 h标签。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号