...
首页> 外文期刊>Language Resources and Evaluation >Curras: an annotated corpus for the Palestinian Arabic dialect
【24h】

Curras: an annotated corpus for the Palestinian Arabic dialect

机译:库拉斯(Curras):巴勒斯坦阿拉伯方言的带注释语料库

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

In this article we present Curras, the first morphologically annotated corpus of the Palestinian Arabic dialect. Palestinian Arabic is one of the many primarily spoken dialects of the Arabic language. Arabic dialects are generally under-resourced compared to Modern Standard Arabic, the primarily written and official form of Arabic. We start in the article with a background description that situates Palestinian Arabic linguistically and historically and compares it to Modern Standard Arabic and Egyptian Arabic in terms of phonological, morphological, orthographic, and lexical variations. We then describe the methodology we developed to collect Palestinian Arabic text to guarantee a variety of representative domains and genres. We also discuss the annotation process we used, which extended previous efforts for annotation guideline development, and utilized existing automatic annotation solutions for Standard Arabic and Egyptian Arabic. The annotation guidelines and annotation meta-data are described in detail. The Curras Palestinian Arabic corpus consists of more than 56 K tokens, which are annotated with rich morphological and lexical features. The inter-annotator agreement results indicate a high degree of consistency.
机译:在本文中,我们介绍了库拉斯(Curras),这是巴勒斯坦阿拉伯方言的第一个用词法标注的语料库。巴勒斯坦阿拉伯语是阿拉伯语许多主要口语中的一种。与现代标准阿拉伯语(阿拉伯语的主要书面形式和官方形式)相比,阿拉伯语的资源通常不足。我们从本文的背景描述开始,该描述从语言和历史上将巴勒斯坦阿拉伯语置于最前面,并在语音,形态,正字法和词汇变化方面将其与现代标准阿拉伯语和埃及阿拉伯语进行比较。然后,我们将描述为收集巴勒斯坦阿拉伯文本而开发的方法,以确保各种代表性领域和体裁。我们还将讨论我们使用的注释过程,该过程扩展了以前为注释准则开发而付出的努力,并利用了针对标准阿拉伯语和埃及阿拉伯语的现有自动注释解决方案。详细说明注释准则和注释元数据。 Curras巴勒斯坦阿拉伯语语料库由超过56 K的令牌组成,带有丰富的形态和词汇特征。注释者之间的协议结果表明高度一致。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号