首页> 外文会议>International Conference on Advanced Intelligent Systems and Informatics >Crowdsourcing Speech and Language Data for Resource-Poor Languages
【24h】

Crowdsourcing Speech and Language Data for Resource-Poor Languages

机译:资源差别语言的众包语音和语言数据

获取原文

摘要

In this paper, we present benefits of using crowdsourcing to build speech and language resources for different annotation tasks for dialectal Arabic as an example of resource-poor languages. We show recommendations for job design and quality control that allow us to build high quality data for variety of tasks. Most of these recommendations are language-independent and can be applied to other languages as well. We summarize lessons learned from experiments in data acquisition tasks, such as image annotation (transcription of Arabic historical documents), machine translation (translation from English to Hindi), speech annotation (transcription of dialectal Arabic audio files), text annotation (conversion from dialectal Arabic to Modern Standard Arabic (MSA)), and text classification (annotation of offensive language on Arabic social media, and classification of questions on Arabic medical web forums).
机译:在本文中,我们对使用众所周心的众所周知的言语和语言资源来构建语义阿拉伯语的不同注释任务的好处,作为资源差的语言的例子。我们为工作设计和质量控制展示了建议,使我们能够为各种任务构建高质量数据。这些建议中的大多数是语言无关的,也可以应用于其他语言。我们总结了从数据采集任务的实验中汲取的经验教训,例如图像注释(阿拉伯语历史文档的转录),机器翻译(从英语转换到印地语),语音注释(语言阿拉伯音频文件的转录),文本注释(从辩证转换阿拉伯语到现代标准阿拉伯语(MSA))和文本分类(阿拉伯社交媒体上的攻击性语言,以及阿拉伯医学网络论坛问题的分类)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号