首页> 外文期刊>ACM transactions on Asian language information processing >Automatically Building VoIP Speech Parallel Corpora for Arabic Dialects
【24h】

Automatically Building VoIP Speech Parallel Corpora for Arabic Dialects

机译:自动为阿拉伯语建立VoIP语音并行语料库

获取原文
获取原文并翻译 | 示例
       

摘要

This article discusses the process of automatically building Arabic multi-dialect speech corpora using Voice over Internet Protocol (VoIP). The Asterisk framework was adopted to act as the main connection between the parties, for which two virtual machines were created: a sender and a receiver. The sender makes a VoIP call to the receiver using the Asterisk framework, while the receiver records the call automatically, a process that is repeated for all the audio files involved in the corpora. In this work, more than 67,000 automatic calls were made between the sender and receiver machines, generating VoIP Arabic corpora for four Arabic dialects. The resulting corpora can be considered the first Arabic VoIP parallel speech corpora and will be made freely available to researchers in Arabic NLP and speech recognition research.
机译:本文讨论了使用互联网协议语音(VoIP)自动建立阿拉伯语多方言语料库的过程。采用Asterisk框架作为双方之间的主要连接,为此创建了两个虚拟机:发送方和接收方。发送方使用Asterisk框架向接收方进行VoIP呼叫,而接收方自动记录该呼叫,此过程将对语料库中涉及的所有音频文件重复进行。在这项工作中,在发送方和接收方机器之间进行了超过67,000个自动呼叫,从而为四个阿拉伯方言生成了VoIP阿拉伯语语料库。由此产生的语料库可以被认为是第一个阿拉伯语VoIP并行语音语料库,并且将免费提供给阿拉伯语NLP和语音识别研究的研究人员。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号