首页> 外文会议>Spoken dialogue systems for ambient environments >Impact of a Newly Developed Modern Standard Arabic Speech Corpus on Implementing and Evaluating Automatic Continuous Speech Recognition Systems
【24h】

Impact of a Newly Developed Modern Standard Arabic Speech Corpus on Implementing and Evaluating Automatic Continuous Speech Recognition Systems

机译:新开发的现代标准阿拉伯语语音语料库对实施和评估自动连续语音识别系统的影响

获取原文
获取原文并翻译 | 示例

摘要

Being current formal linguistic standard and only acceptable form of Arabic language for all native speakers, Modern Standard Arabic (MSA) still lacks sufficient spoken corpora compared to other forms like Dialectal Arabic. This paper describes our work towards developing a new speech corpus for MSA, which can be used for implementing and evaluating any Arabic automatic continuous speech recognition system. The speech corpus contains 415 (367 training and 48 testing) sentences recorded by 42 (21 male and 21 female) Arabic native speakers from 11 countries representing three major regions (Levant, Gulf, and Africa). The impact of using this speech corpus on overall performance of Arabic automatic continuous speech recognition systems was examined. Two development phases were conducted based on the size of training data, Gaussian mixture distributions, and tied states (senones). Overall results indicate that larger training data size result higher word recognition rates and lower Word Error Rates (WER).
机译:作为当前的正式语言标准,并且是所有母语人士只能接受的阿拉伯语形式,与其他方言(如方言阿拉伯语)相比,现代标准阿拉伯语(MSA)仍缺乏足够的口语语料。本文介绍了我们为MSA开发新的语料库的工作,该语料库可用于实现和评估任何阿拉伯语自动连续语音识别系统。语音语料库包含来自11个国家的42个语言(分别来自黎凡特,海湾和非洲)的42位(21位男性和21位女性)阿拉伯语母语人士录制的415(367条培训和48条测试)句子。研究了使用该语音语料库对阿拉伯自动连续语音识别系统整体性能的影响。根据训练数据的大小,高斯混合分布和束缚态(senones)进行了两个开发阶段。总体结果表明,较大的训练数据量会导致较高的单词识别率和较低的单词错误率(WER)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号