首页> 外文期刊>Language Resources and Evaluation >Investigating the effects of gender, dialect, and training size on the performance of Arabic speech recognition
【24h】

Investigating the effects of gender, dialect, and training size on the performance of Arabic speech recognition

机译:调查性别,方言和培训规模对阿拉伯语演讲表现的影响

获取原文
获取原文并翻译 | 示例
           

摘要

Research in Arabic automatic speech recognition (ASR) is constrained by datasets of limited size, and of highly variable content and quality. Arabic-language resources vary in the attributes that affect language resources in other languages (noise, channel, speaker, genre), but also vary significantly in the dialect and level of formality of the spoken Arabic they capture. Many languages suffer similar levels of cross-dialect and cross-register acoustic variability, but these effects have been under-studied. This paper is an experimental analysis of the interaction between classical ASR corpus-compensation methods (feature selection, data selection, gender-dependent acoustic models) and the dialect-dependent/register-dependent variation among Arabic ASR corpora. The first interaction studied in this paper is that between acoustic recording quality and discrete pronunciation variation. Discrete pronunciation variation can be compensated by using grapheme-based instead of phone-based acoustic models, and by filtering out speakers with insufficient training data; the latter technique also helps to compensate for poor recording quality, which is further compensated by eliminating delta-delta acoustic features. All three techniques, together, reduce Word Error Rate (WER) by between 3.24% and 5.35%. The second aspect of dialect and register variation to be considered is variation in the fine-grained acoustic pronunciations of each phoneme in the language. Experimental results prove that gender and dialect are the principal components of variation in speech, therefore, building gender and dialect-specific models leads to substantial decreases in WER. In order to further explore the degree of acoustic differences between phone models required for each of the dialects of Arabic, cross-dialect experiments are conducted to measure how far apart Arabic dialects are acoustically in order to make a better decision about the minimal number of recognition systems needed to cover all dialectal Arabic. Finally, the research addresses an important question: how much training data is needed for building efficient speaker-independent ASR systems? This includes developing some learning curves to find out how large must the training set be to achieve acceptable performance.
机译:阿拉伯语自动语音识别(ASR)的研究受到有限尺寸的数据集和高度可变内容和质量的约束。阿拉伯语资源在影响其他语言中的语言资源(噪声,渠道,扬声器,类型)中的属性中变化,但也在他们捕获的阿拉伯语口语的方言和形式的形式中显着变化。许多语言遭受相似的交叉方言和交叉寄存器声学变异性,但这些效果已经研究过。本文是对古典ASR语料库补偿方法(特征选择,数据选择,性别依赖性声学模型)的交互的实验分析,以及阿拉伯语ASR Corpora之间的方言依赖/寄存器依赖性变化。本文研究的第一个交互是声学记录质量和离散的发音变化之间。可以通过使用基于Graineme的代替电话的声学模型来补偿离散的发音变化,并通过过滤训练数据不足的扬声器;后一种技术还有助于补偿较差的记录质量,这通过消除Delta-Delta声学特征进一步补偿。所有三种技术,在一起,减少字错误率(WER)达到3.24%和5.35%。要考虑的方言和寄存器变化的第二方面是语言中每个音素的细粒度声学发音的变化。实验结果证明,性别和方言是语音变异的主要组成部分,因此,建立性别和方言特定模型导致WER的大量减少。为了进一步探索阿拉伯语中每种方言所需的电话模型之间的声学​​差异,进行交叉方言实验,以测量阿拉伯语方言的差距在声学上,以便更好地决定最小的识别数涵盖所有方言阿拉伯语所需的系统。最后,研究解决了一个重要问题:建立高效的扬声器无关的ASR系统需要多少培训数据?这包括开发一些学习曲线,以了解培训集必须多大,以实现可接受的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号