首页> 外文期刊>Procedia Computer Science >Altruistic Crowdsourcing for Arabic Speech Corpus Annotation
【24h】

Altruistic Crowdsourcing for Arabic Speech Corpus Annotation

机译:阿拉伯语音语料库注释的无私众包

获取原文
           

摘要

Crowdsourcing is an emerging collaborative approach that can be used for effective annotations of linguistic resources. There are many crowdsourcing genres: paid-for, games with a purpose, or altruistic (volunteer-based) approaches. In this paper, we investigate the use of altruistic crowdsourcing for speech corpora annotation by narrating our experience of validating a semi-automatic task for dialect annotation of Kalam’DZ, a corpus dedicated to Arabic Algerian dialectal varieties. We start by describing the whole process of designing altruistic crowdsourcing project. Using the unpaid crowdcrafting platform, we have performed experiments on a sample of 10% of Kalam’DZ corpus, totaling more than 10 h with 1012 speakers. The evaluation of this crowdsourcing job is ensured through a comparison with a gold standard annotation done by experts which affirms a high level of inter-annotation agreements of 81%. Our results confirm that altruistic crowdsourcing is an effective approach for speech dialect annotation. In addition, we present a set of best practices for altruistic crowdsourcing for corpus annotations.
机译:众包是一种新兴的协作方法,可用于有效注释语言资源。众包类型很多:付费,有目的的游戏或利他(基于志愿者)的方法。在本文中,我们通过叙述我们验证了Kalam’DZ(一种专用于阿拉伯阿尔及利亚方言变体的语料库)的半自动任务进行方言注释的经验,研究了利他式众包在语音语料库注释中的使用。我们从描述无私众包项目的设计全过程开始。使用无偿的众筹平台,我们对10%的卡拉姆DZ语料进行了实验,并与1012名发言人进行了长达10个小时的测试。通过与专家所做的黄金标准注释进行比较,可以确保对此众包工作进行评估,该注释确定了81%的高注释间协议。我们的结果证实,利他的众包是语音方言注释的有效方法。此外,我们为语料库注释提供了一套利他式众包的最佳实践。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号