首页> 外文会议>2015 First International Conference on Arabic Computational Linguistics >Building a Corpus for Arabic Dialects Using Games with a Purpose

Building a Corpus for Arabic Dialects Using Games with a Purpose


获取原文并翻译 | 示例


There is a huge gap between the written form of Arabic, Modern Standard Arabic (MSA), and the different spoken Arabic dialects due to the big number of dialects. In addition, most Arabic data-sets are formed for MSA content. Traditional ways of identifying dialects of texts are time and money consuming. In addition, due to the morphological complexity of Arabic, the gender of the speaker may change structure of an Arabic sentence. Thus, dialects hold rich information (such as the origin of the speaker and the gender of the addressee). A Game With A Purpose (GWAP) called "3ammeya" is implemented to identify the dialects of Arabic sentences along with their MSA translations. Moreover, through the game, the gender of the speaker addressee are classified. The collected data will help construct an expandable and cheap corpus for dialect identification and translation to MSA.
机译:由于方言数量众多,阿拉伯语的书面形式,现代标准阿拉伯语(MSA)与不同的阿拉伯语方言之间存在巨大差距。此外,大多数阿拉伯数据集都是针对MSA内容形成的。识别文本方言的传统方法既费时又费钱。另外,由于阿拉伯语的形态复杂性,说话者的性别可能会改变阿拉伯语句子的结构。因此,方言拥有丰富的信息(例如讲话者的来历和收件人的性别)。实施了一种名为“ 3ammeya”的“有目的的游戏”(GWAP),以识别阿拉伯文句子的方言及其MSA翻译。此外,通过游戏,对演讲者的收件人的性别进行了分类。收集的数据将有助于构建可扩展且廉价的语料库,以进行方言识别和转换为MSA。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号