首页> 外文会议>Workshop on Arabic natural language processing >POS-tagging of Tunisian Dialect Using Standard Arabic Resources and Tools
【24h】

POS-tagging of Tunisian Dialect Using Standard Arabic Resources and Tools

机译:使用标准阿拉伯文资源和工具的突尼斯方言的POS标记

获取原文

摘要

Developing natural language processing tools usually requires a large number of resources (lexica, annotated corpora, etc.), which often do not exist for less-resourced languages. One way to overcome the problem of lack of resources is to devote substantial efforts to build new ones from scratch. Another approach is to exploit existing resources of closely related languages. In this paper, we focus on developing a part-of-speech tagger for the Tunisian Arabic dialect (TUN), a low-resource language, by exploiting its closeness to Modern Standard Arabic (MSA), which has many state-of-the-art resources and tools. Our system achieved an accuracy of 89% (~20% absolute improvement over an MSA tagger baseline).
机译:开发自然语言处理工具通常需要大量资源(Lexica,注释的语料库等),这通常不存在于较少资源的语言。克服资源缺乏问题的一种方法是致力于从头开始建立新的努力。另一种方法是利用与密切相关的人的现有资源。在本文中,我们专注于开发突尼斯阿拉伯语方言(TUN),一种低资源语言的术语标签,通过利用其与现代标准阿拉伯语(MSA)的亲密关系,这具有许多状态 - 资源和工具。我们的系统达到了89%(无MSA标签基线绝对改进的〜20%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号