Automatically Building VoIP Speech Parallel Corpora for Arabic Dialects

KHALID ALMEMAN

首页> 外文期刊>ACM transactions on Asian language information processing >Automatically Building VoIP Speech Parallel Corpora for Arabic Dialects

【24h】

Automatically Building VoIP Speech Parallel Corpora for Arabic Dialects

机译：自动为阿拉伯语建立VoIP语音并行语料库

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This article discusses the process of automatically building Arabic multi-dialect speech corpora using Voice over Internet Protocol (VoIP). The Asterisk framework was adopted to act as the main connection between the parties, for which two virtual machines were created: a sender and a receiver. The sender makes a VoIP call to the receiver using the Asterisk framework, while the receiver records the call automatically, a process that is repeated for all the audio files involved in the corpora. In this work, more than 67,000 automatic calls were made between the sender and receiver machines, generating VoIP Arabic corpora for four Arabic dialects. The resulting corpora can be considered the first Arabic VoIP parallel speech corpora and will be made freely available to researchers in Arabic NLP and speech recognition research.

机译：本文讨论了使用互联网协议语音（VoIP）自动建立阿拉伯语多方言语料库的过程。采用Asterisk框架作为双方之间的主要连接，为此创建了两个虚拟机：发送方和接收方。发送方使用Asterisk框架向接收方进行VoIP呼叫，而接收方自动记录该呼叫，此过程将对语料库中涉及的所有音频文件重复进行。在这项工作中，在发送方和接收方机器之间进行了超过67,000个自动呼叫，从而为四个阿拉伯方言生成了VoIP阿拉伯语语料库。由此产生的语料库可以被认为是第一个阿拉伯语VoIP并行语音语料库，并且将免费提供给阿拉伯语NLP和语音识别研究的研究人员。

著录项

来源
《ACM transactions on Asian language information processing》 |2018年第1期|4.1-4.12|共12页
作者
KHALID ALMEMAN;
展开▼
作者单位

Qassim University;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
VoIP corpora; asterisk; arabic multi-dialect; arabic speech recognition;

机译：VoIP语料库;星号阿拉伯语多方言阿拉伯语音识别;
入库时间 2022-08-18 04:03:41

相似文献

外文文献
中文文献
专利

1. The Building and Evaluation of a Mobile Parallel Multi-Dialect Speech Corpus for Arabic [J] . Khalid Almeman Procedia Computer Science . 2018,第1期

机译：阿拉伯语移动平行多方言语音语料库的建立和评估
2. Development of the Arabic Loria Automatic Speech Recognition system (ALASR) and its evaluation for Algerian dialect [J] . Mohamed Amine Menacer, Odile Mella, Dominique Fohr, Procedia Computer Science . 2017,第1期

机译：阿拉伯语Loria自动语音识别系统（ALASR）的开发及其对阿尔及利亚方言的评估
3. Development of the Arabic Loria Automatic Speech Recognition system (ALASR) and its evaluation for Algerian dialect [J] . Mohamed Amine Menacer, Odile Mella, Dominique Fohr, Procedia Computer Science . 2017,第1期

机译：阿拉伯语Loria自动语音识别系统（ALASR）的开发及其对阿尔及利亚方言的评估
4. Automatic building of Arabic multi dialect text corpora by bootstrapping dialect words [C] . Almeman Khalid, Lee Mark International Conference on Communications, Signal Processing, and their Applications . 2013

机译：通过引导方言单词自动构建阿拉伯语多方言文本语料库
5. Parallel automatic term extraction from large Web corpora. [D] . Zhang, Lingyan. 2004

机译：从大型Web语料库中并行自动提取术语。
6. Morphological structure in the Arabic mental lexicon: Parallels between standard and dialectal Arabic [O] . Sami Boudelaa, William D. Marslen-Wilson -1

机译：阿拉伯语心理词典中的形态结构：标准阿拉伯语与方言阿拉伯语之间的平行
7. Automatic Building of Arabic Multi Dialect Text Corpora by Bootstrapping Dialect Words [O] . Khalid Almeman, Mark Lee 2013

机译：用方言词自动构建阿拉伯语多方言语篇语料库

Automatically Building VoIP Speech Parallel Corpora for Arabic Dialects

摘要

著录项

相似文献

相关主题

期刊订阅