首页> 外文会议>Workshop on Arabic natural language processing >POS-tagging of Tunisian Dialect Using Standard Arabic Resources and Tools
【24h】

POS-tagging of Tunisian Dialect Using Standard Arabic Resources and Tools

机译:使用标准阿拉伯语资源和工具对突尼斯方言进行POS标记

获取原文

摘要

Developing natural language processing tools usually requires a large number of resources (lexica, annotated corpora, etc.), which often do not exist for less-resourced languages. One way to overcome the problem of lack of resources is to devote substantial efforts to build new ones from scratch. Another approach is to exploit existing resources of closely related languages. In this paper, we focus on developing a part-of-speech tagger for the Tunisian Arabic dialect (TUN), a low-resource language, by exploiting its closeness to Modern Standard Arabic (MSA), which has many state-of-the-art resources and tools. Our system achieved an accuracy of 89% (~20% absolute improvement over an MSA tagger baseline).
机译:开发自然语言处理工具通常需要大量资源(词典,带注释的语料库等),对于资源较少的语言通常不存在。解决资源短缺问题的一种方法是投入大量精力从头开始构建新资源。另一种方法是利用紧密相关语言的现有资源。在本文中,我们将致力于开发突尼斯阿拉伯方言(TUN)的词性标记器,该标记器是一种资源少的语言,它利用了与现代标准阿拉伯语(MSA)的紧密联系,后者具有许多最新的状态艺术资源和工具。我们的系统达到了89%的准确性(相对于MSA标记器基准,绝对改善了约20%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号