首页> 外文会议>International conference on natural language and speech processing >A Crowdsourcing-based Approach for Speech Corpus Transcription Case of Arabic Algerian Dialects
【24h】

A Crowdsourcing-based Approach for Speech Corpus Transcription Case of Arabic Algerian Dialects

机译:一种基于众包语音语料库转录案例的阿拉伯阿尔及利亚方言的众包

获取原文

摘要

In this paper we describe a corpus annotation project based on crowdsourcing technique that performs orthographic transcription of Kalam'DZ corpus (Bougrine et al., 2017c). This latter is a speech corpus dedicated to Arabic Algerian dialectal varieties. The recourse to crowdsourcing solution is deployed to avoid time and cost consuming solutions that involves experts. Since Arabic dialects have no standard orthographic, we have fixed some guidelines that helps crowd to get more normalized transcriptions. We have performed experiments on a sample of 10% of KALAM'DZ corpus, totaling 8.75 hours. The quality control of the output transcription is ensured within three stages: Pre-qualification of crowd, online filtering and in lab validation and revision. A baseline resource is used to evaluate both first stages. It consists on 5% of the targeted dataset transcribed by well trained transcribers. Our results confirm that the crowdsourcing solution is an effective approach for speech dialect transcription when we deal with under-resourced dialects. Before the validation of the well trained transcribers the accuracy of transcriptions reached 74.38. In addition, we present a set of best practices for crowdsourcing speech corpus transcription.
机译:在本文中,我们描述了一种基于众包技术的语料库注释项目,该技术执行了Kalam'dz语料库的正交转录(Bougrine等,2017c)。这篇后者是致力于阿拉伯阿尔及利亚方言品种的语音语料库。部署携带众包解决方案以避免涉及专家的时间和成本消耗解决方案。由于阿拉伯语方言没有标准正交,因此我们已经解决了一些有助于人群获得更多规范化转录的指导方针。我们对克拉姆的DEZ语料库的10%的样本进行了实验,总计了8.75小时。在三个阶段内确保输出转录的质量控制:人群预定,在线过滤和实验室验证和修订。基线资源用于评估第一阶段。它由训练有素的转录器转录的5%的目标数据集。我们的结果证实,当我们处理资源不足的方言时,众群解决方案是语音方言转录的有效方法。在验证良好训练的转录之前,转录的准确性达到74.38。此外,我们为众包语音语料库转录提供了一系列最佳实践。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号