首页> 外文期刊>Language Resources and Evaluation >Urdu part of speech tagging using conditional random fields
【24h】

Urdu part of speech tagging using conditional random fields

机译:URDU使用条件随机字段标记的语音标记部分

获取原文
获取原文并翻译 | 示例

摘要

Part of speech (POS) tagging, the assignment of syntactic categories for words in running text, is significant to natural language processing as a preliminary task in applications such as speech processing, information extraction, and others. Urdu language processing presents a challenge due to the dual behaviour of various Urdu POS tags in differing situations (morphosyntactic ambiguity). This paper addresses this challenge by developing a novel tagging approach using linear-chain conditional random fields (CRF). Our work is the first instance of a CRF approach for Urdu POS tagging. The proposed model employs a strong, stable and balanced language-independent as well as language dependent feature set. The language-dependent feature considered includes part-of-speech tag of the previous word and suffix of the current word while the language-independent features includes the 'context words window'. Our approach was evaluated against support vector machine techniques for Urdu POS-considered as state of the art-on two benchmark datasets. The results show our CRF approach to improve upon the F-measure of prior attempts by 8.3-8.5%.
机译:词性(POS)标记的一部分,在运行文本中分配语法类别,对自然语言处理是重要的,作为语音处理,信息提取等的应用程序中的初步任务。乌尔都语语言处理由于不同情况下的各种URDU POS标签的双重行为而呈现挑战(形态学歧义)。本文通过使用线性链条条件随机字段(CRF)开发一种新颖的标记方法来解决这一挑战。我们的作品是Urdu POS标记的CRF方法的第一个实例。拟议的模型采用强大,稳定和平衡的语言无关以及语言依赖功能集。被认为的语言依赖性功能包括当前单词的前一词和后缀的词组标记,而语言无关的功能包括“上下文单词窗口”。我们的方法是针对支持向量的支持向量机技术进行评估 - 被认为是艺术状态的urdu POS - 在两个基准数据集上。结果表明,我们的CRF方法可以提高事先尝试的F-Peasure 8.3-8.5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号