Investigating the effects of gender, dialect, and training size on the performance of Arabic speech recognition

Alsharhan Eiman; Ramsay Allan

首页> 外文期刊>Language Resources and Evaluation >Investigating the effects of gender, dialect, and training size on the performance of Arabic speech recognition

【24h】

Investigating the effects of gender, dialect, and training size on the performance of Arabic speech recognition

机译：调查性别，方言和培训规模对阿拉伯语演讲表现的影响

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Research in Arabic automatic speech recognition (ASR) is constrained by datasets of limited size, and of highly variable content and quality. Arabic-language resources vary in the attributes that affect language resources in other languages (noise, channel, speaker, genre), but also vary significantly in the dialect and level of formality of the spoken Arabic they capture. Many languages suffer similar levels of cross-dialect and cross-register acoustic variability, but these effects have been under-studied. This paper is an experimental analysis of the interaction between classical ASR corpus-compensation methods (feature selection, data selection, gender-dependent acoustic models) and the dialect-dependent/register-dependent variation among Arabic ASR corpora. The first interaction studied in this paper is that between acoustic recording quality and discrete pronunciation variation. Discrete pronunciation variation can be compensated by using grapheme-based instead of phone-based acoustic models, and by filtering out speakers with insufficient training data; the latter technique also helps to compensate for poor recording quality, which is further compensated by eliminating delta-delta acoustic features. All three techniques, together, reduce Word Error Rate (WER) by between 3.24% and 5.35%. The second aspect of dialect and register variation to be considered is variation in the fine-grained acoustic pronunciations of each phoneme in the language. Experimental results prove that gender and dialect are the principal components of variation in speech, therefore, building gender and dialect-specific models leads to substantial decreases in WER. In order to further explore the degree of acoustic differences between phone models required for each of the dialects of Arabic, cross-dialect experiments are conducted to measure how far apart Arabic dialects are acoustically in order to make a better decision about the minimal number of recognition systems needed to cover all dialectal Arabic. Finally, the research addresses an important question: how much training data is needed for building efficient speaker-independent ASR systems? This includes developing some learning curves to find out how large must the training set be to achieve acceptable performance.

机译：阿拉伯语自动语音识别（ASR）的研究受到有限尺寸的数据集和高度可变内容和质量的约束。阿拉伯语资源在影响其他语言中的语言资源（噪声，渠道，扬声器，类型）中的属性中变化，但也在他们捕获的阿拉伯语口语的方言和形式的形式中显着变化。许多语言遭受相似的交叉方言和交叉寄存器声学变异性，但这些效果已经研究过。本文是对古典ASR语料库补偿方法（特征选择，数据选择，性别依赖性声学模型）的交互的实验分析，以及阿拉伯语ASR Corpora之间的方言依赖/寄存器依赖性变化。本文研究的第一个交互是声学记录质量和离散的发音变化之间。可以通过使用基于Graineme的代替电话的声学模型来补偿离散的发音变化，并通过过滤训练数据不足的扬声器;后一种技术还有助于补偿较差的记录质量，这通过消除Delta-Delta声学特征进一步补偿。所有三种技术，在一起，减少字错误率（WER）达到3.24％和5.35％。要考虑的方言和寄存器变化的第二方面是语言中每个音素的细粒度声学发音的变化。实验结果证明，性别和方言是语音变异的主要组成部分，因此，建立性别和方言特定模型导致WER的大量减少。为了进一步探索阿拉伯语中每种方言所需的电话模型之间的声学差异，进行交叉方言实验，以测量阿拉伯语方言的差距在声学上，以便更好地决定最小的识别数涵盖所有方言阿拉伯语所需的系统。最后，研究解决了一个重要问题：建立高效的扬声器无关的ASR系统需要多少培训数据？这包括开发一些学习曲线，以了解培训集必须多大，以实现可接受的性能。

著录项

来源
《Language Resources and Evaluation》 |2020年第4期|975-998|共24页
作者
Alsharhan Eiman; Ramsay Allan;
展开▼
作者单位

Kuwait Univ Kuwait Kuwait;

Univ Manchester Manchester Lancs England;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Development of the Arabic Loria Automatic Speech Recognition system (ALASR) and its evaluation for Algerian dialect [J] . Mohamed Amine Menacer, Odile Mella, Dominique Fohr, Procedia Computer Science . 2017,第1期

机译：阿拉伯语Loria自动语音识别系统（ALASR）的开发及其对阿尔及利亚方言的评估
2. Development of the Arabic Loria Automatic Speech Recognition system (ALASR) and its evaluation for Algerian dialect [J] . Mohamed Amine Menacer, Odile Mella, Dominique Fohr, Procedia Computer Science . 2017,第1期

机译：阿拉伯语Loria自动语音识别系统（ALASR）的开发及其对阿尔及利亚方言的评估
3. Cross-dialectal data sharing for acoustic modeling in Arabic speech recognition [J] . Kirchhoff K, Vergyri D Speech Communication . 2005,第1期

机译：跨方言数据共享，用于阿拉伯语音识别中的声学建模
4. The Influence the Training Set Size Has on the Performance of a Digit Speech Recognition System in Macedonian [C] . Daniel Spasovski, Goran Peshanski, Gjorgji Madjarov ICT Innovations Conference . 2015

机译：影响训练集大小的影响对马其顿中的数字语音识别系统的性能
5. Automatic Dialect and Accent Recognition and its Application to Speech Recognition [D] . Biadsy, Fadi 2011

机译：方言和重音自动识别及其在语音识别中的应用
6. Effects of Long-Term Training on Aided Speech-Recognition Performance in Noise in Older Adults [O] . Matthew H. Burk, Larry E. Humes -1

机译：长期训练对老年人语音辅助语音识别性能的影响
7. The effects of speakers' gender, age, and region on overall performance of Arabic automatic speech recognition systems using the phonetically rich and balanced Modern Standard Arabic speech corpus [O] . Sawalha M, Abu Shariah M 2013

机译：发言者的性别，年龄和地区对使用语音丰富和平衡的现代标准阿拉伯语言语料库的阿拉伯语自动语音识别系统整体表现的影响
8. Use of Computer Speech Understanding in Training: A Preliminary Investigation of a Limited Continuous Speech Recognition Capability. [R] . Porter, J. E., Grady, M. W., Hicklin, M. B., 1977

机译：计算机语音理解在训练中的运用：有限连续语音识别能力的初步研究。

Investigating the effects of gender, dialect, and training size on the performance of Arabic speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅