A STATISTICAL BASED PART OF SPEECH TAGGER FOR URDU LANGUAGE

机译：基于统计的URDU语言标记的一部分

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we present a pioneering step in designing n-gram based part of speech tagger for the Urdu language.In the last few years part of speech tagging work has been done in the area of supposed English, South Asian and European languages.In this paper our focus of attention is on the disambiguation problem (to assign the accurate tag for every word of a set of possible tags).Our approach employs n-gram Markov Model, train from annotated Urdu corpus and assigns possible tags to text.The proposed n-gram part of speech tagger has been tested which achieved state of the art performance of 95.0%.Furthermore, we check our experiment results of two type of tagset.Along the way, we apply evaluation method that shows how significant our experiment results are.Besides, we present the error analysis (Confusion Matrix) and show the tagging example of Urdu tagging.We also present overview of Urdu language.The contribution of our work is an initial step of statistical based Urdu part of speech tagger.

机译：在本文中，我们提出了在为乌尔都语语言设计基于n-gram的语音标记器方面的开拓性步骤。在过去的几年中，语音标记的一部分工作已在假定的英语，南亚和欧洲语言领域进行。本文的重点是消除歧义问题（为一组可能的标签中的每个单词分配准确的标签）。我们的方法采用n-gram马尔可夫模型，从带注释的Urdu语料库中训练并将可能的标签分配给文本。我们测试了建议的n-gram语音标记器，其性能达到了95.0％的最新水平。此外，我们检查了两种类型标记集的实验结果，并沿用了评估方法，该方法表明了我们的实验结果有多么重要此外，我们介绍了错误分析（混淆矩阵）并显示了Urdu标记的标记示例。我们还介绍了Urdu语言的概述。我们的工作是基于统计的speec基于Urdu的第一步。 h标签。

著录项

来源
《Proceedings of the 2007 International Conference on Machine Learning and Cybernetics》|2007年|P.3418-3424|共7页
会议地点
作者
WAQAS ANWAR; XUAN WANG; LU LI; XIAO-LONG WANG;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Urdu language; Language model; Part-of-speech tagging;

机译：乌尔都语;语言模型;词性标注;

相似文献

外文文献
中文文献
专利

1. Hidden Markov Model Based Part of Speech Tagger for Urdu [J] . Waqas Anwar, Xuan Wang, LuLi, Information Technology Journal . 2007,第8期

机译：基于隐马尔可夫模型的乌尔都语语音标注器
2. Training and Evaluating a Statistical Part of Speech Tagger for Natural Language Applications using Kepler Workflows [J] . Doug Briesch, Reginald Hobbs, Claire Jaja, Procedia Computer Science . 2012,第1期

机译：使用开普勒工作流为自然语言应用训练和评估语音标注器的统计部分
3. A tree-based statistical language model for natural language speech recognition [J] . Bahl L.R., Brown P.F. IEEE Transactions on Acoustics, Speech, and Signal Processing . 1989,第7期

机译：用于自然语言语音识别的基于树的统计语言模型
4. A STATISTICAL BASED PART OF SPEECH TAGGER FOR URDU LANGUAGE [C] . WAQAS ANWAR, XUAN WANG, LU LI, International Conference on Machine Learning and Cybernetics . 2007

机译：乌尔都语语言的统计基于语音标记器的部分
5. Perceptions of speech-language pathologists and speech-language pathology supervisors regarding personnel shortages in the educational setting. [D] . McGregor, Andrea Petersen. 2008

机译：言语病理学家和言语病理学主管对教育环境中人员短缺的看法。
6. Seeking Temporal Predictability in Speech: Comparing Statistical Approaches on 18 World Languages [O] . Yannick Jadoul, Andrea Ravignani, Bill Thompson, 2016

机译：寻求语音的时间可预测性：比较18种世界语言的统计方法
7. Language Variations at the Gender Level: A Sociolinguistic Investigation of Language Varieties used by Women among the Urdu Speech Community of North India [O] . Fatima Anjum 2010

机译：性别层面的语言变异：北印度乌尔都语语音社区中妇女使用的语言变异的社会语言学调查
8. Low-Resource Speech Translation of Urdu to English Using Semi- Supervised Part-of-Speech Tagging and Transliteration [R] . Aminzadeh, A. R., Shen, W. 2008

机译：利用半监督词性标注和音译将乌尔都语低资源语音翻译成英语

A STATISTICAL BASED PART OF SPEECH TAGGER FOR URDU LANGUAGE

摘要

著录项

相似文献

相关主题

期刊订阅