Crowdsourcing Speech and Language Data for Resource-Poor Languages

机译：资源差别语言的众包语音和语言数据

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we present benefits of using crowdsourcing to build speech and language resources for different annotation tasks for dialectal Arabic as an example of resource-poor languages. We show recommendations for job design and quality control that allow us to build high quality data for variety of tasks. Most of these recommendations are language-independent and can be applied to other languages as well. We summarize lessons learned from experiments in data acquisition tasks, such as image annotation (transcription of Arabic historical documents), machine translation (translation from English to Hindi), speech annotation (transcription of dialectal Arabic audio files), text annotation (conversion from dialectal Arabic to Modern Standard Arabic (MSA)), and text classification (annotation of offensive language on Arabic social media, and classification of questions on Arabic medical web forums).

机译：在本文中，我们对使用众所周心的众所周知的言语和语言资源来构建语义阿拉伯语的不同注释任务的好处，作为资源差的语言的例子。我们为工作设计和质量控制展示了建议，使我们能够为各种任务构建高质量数据。这些建议中的大多数是语言无关的，也可以应用于其他语言。我们总结了从数据采集任务的实验中汲取的经验教训，例如图像注释（阿拉伯语历史文档的转录），机器翻译（从英语转换到印地语），语音注释（语言阿拉伯音频文件的转录），文本注释（从辩证转换阿拉伯语到现代标准阿拉伯语（MSA））和文本分类（阿拉伯社交媒体上的攻击性语言，以及阿拉伯医学网络论坛问题的分类）。

著录项

来源
《International Conference on Advanced Intelligent Systems and Informatics》|2017年|xx 917 p.|共8页
会议地点
作者
Hamdy Mubarak;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP182-532;
关键词
Crowdsourcing; Dialectal arabic; Low-resource languages;

机译：众包;方言阿拉伯语;低资源语言;

相似文献

外文文献
中文文献
专利

1. Collaborative Speech Data Acquisition for Under Resourced Languages through Crowdsourcing [J] . Sunita Arora, Karunesh Kumar Arora, Mukund Kumar Roy, Procedia Computer Science . 2016,第1期

机译：通过众包获取资源贫乏语言的协作语音数据
2. Improving Statistical Machine Translation for a Resource-Poor Language Using Related Resource-Rich Languages [J] . Nakov P., Ng H. T. The Journal of Artificial Intelligence Research . 2012,第4期

机译：使用相关的资源丰富的语言改善资源贫乏的语言的统计机器翻译
3. Improving Statistical Machine Translation for a Resource-Poor Language Using Related Resource-Rich Languages [J] . Preslav Nakov, Hwee Tou Ng The Journal of Artificial Intelligence Research . 2012,第Null期

机译：使用相关的资源丰富的语言改善资源贫乏的语言的统计机器翻译
4. Crowdsourcing Speech and Language Data for Resource-Poor Languages [C] . Hamdy Mubarak International Conference on Advanced Intelligent Systems and Informatics . 2017

机译：资源差别语言的众包语音和语言数据
5. Morphological Inference from Bitext for Resource-Poor Languages [D] . Szymanski, Terrence D. 2012

机译：来自资源匮乏语言的双文本的形态学推断
6. Crowdsourcing and Minority Languages: The Case of Galician Inflected Infinitives [O] . Michelle Sheehan, Martin Schäfer, Maria Carmen Parafita Couto 2005

机译：众包和少数民族语言：以加利西亚语不定式为例
7. Collaborative Speech Data Acquisition for Under Resourced Languages through Crowdsourcing [O] . Arora Sunita, Arora Karunesh Kumar, Roy Mukund Kumar, 2016

机译：通过众包获取资源贫乏语言的协作语音数据
8. Speech Recognition, Articulatory Feature Detection, and Speech Synthesis in Multiple Languages [R] . Ore, B. M. 2009

机译：语音识别，发音特征检测和多语言语音合成

Crowdsourcing Speech and Language Data for Resource-Poor Languages

摘要

著录项

相似文献

相关主题

期刊订阅