A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition

Xiong Xiao; Jinyu Li; Eng Siong Chng; Haizhou Li; Chin-Hui Lee

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition

【24h】

A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition

机译：鲁棒语音识别声学模型的泛化能力研究

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

AI期刊论文写作 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, we explore the generalization capability of acoustic model for improving speech recognition robustness against noise distortions. While generalization in statistical learning theory originally refers to the model's ability to generalize well on unseen testing data drawn from the same distribution as that of the training data, we show that good generalization capability is also desirable for mismatched cases. One way to obtain such general models is to use margin-based model training method, e.g., soft-margin estimation (SME), to enable some tolerance to acoustic mismatches without a detailed knowledge about the distortion mechanisms through enhancing margins between competing models. Experimental results on the Aurora-2 and Aurora-3 connected digit string recognition tasks demonstrate that, by improving the model's generalization capability through SME training, speech recognition performance can be significantly improved in both matched and low to medium mismatched testing cases with no language model constraints. Recognition results show that SME indeed performs better with than without mean and variance normalization, and therefore provides a complimentary benefit to conventional feature normalization techniques such that they can be combined to further improve the system performance. Although this study is focused on noisy speech recognition, we believe the proposed margin-based learning framework can be extended to dealing with different types of distortions and robustness issues in other machine learning applications.

机译：在本文中，我们探索声学模型的泛化能力，以提高语音识别对噪声失真的鲁棒性。虽然统计学习理论中的泛化最初指的是模型在与训练数据分布相同的分布中得出的看不见的测试数据上很好地泛化的能力，但我们证明了良好的泛化能力对于不匹配的情况也很理想。获得这种通用模型的一种方法是使用基于余量的模型训练方法，例如软余量估计（SME），以在不详细了解失真机制的情况下通过增强竞争模型之间的余量来实现对声音失配的一定容忍度。对Aurora-2和Aurora-3连接的数字字符串识别任务的实验结果表明，通过在SME培训中提高模型的泛化能力，在没有语言模型的匹配和中低匹配测试案例中，语音识别性能都可以得到显着提高。约束。识别结果表明，在没有均值和方差归一化的情况下，SME确实确实表现更好，因此，与常规特征归一化技术相比，SME具有互补的优势，因此可以将它们组合起来以进一步改善系统性能。尽管此研究的重点是嘈杂的语音识别，但我们认为，建议的基于余量的学习框架可以扩展到在其他机器学习应用程序中处理不同类型的失真和鲁棒性问题。

著录项

来源
《Audio, Speech, and Language Processing, IEEE Transactions on》 |2010年第6期|P.1158-1169|共12页
作者
Xiong Xiao; Jinyu Li; Eng Siong Chng; Haizhou Li; Chin-Hui Lee;
展开▼
作者单位

Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Aurora task; discriminative training; large margin; robust speech recognition;

机译：极光任务;区分性训练;大幅度提高;语音识别能力强;

相似文献

外文文献
中文文献
专利

1. A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech [J] . Yan-Hui Tu, Jun Du, Chin-Hui Lee Journal of signal processing systems for signal, image, and video technology . 2018,第7期

机译：基于说话者的基于深度神经网络的单通道联合语音分离和声学建模方法，用于多语音对话的鲁棒识别
2. An End-to-End Deep Learning Approach to Simultaneous Speech Dereverberation and Acoustic Modeling for Robust Speech Recognition [J] . Bo Wu, Kehuang Li, Fengpei Ge, Selected Topics in Signal Processing, IEEE Journal of . 2017,第8期

机译：端到端深度学习方法可同时进行语音去混响和声学建模，以实现可靠的语音识别
3. Towards Robust Indonesian Speech Recognition with Spontaneous-Speech Adapted Acoustic Models [J] . Devin Hoesen, Cil Hardianto Satriawan, Dessi Puji Lestari, Procedia Computer Science . 2016,第1期

机译：利用自发语音自适应声学模型实现鲁棒的印尼语音识别
4. A study on hidden Markov model's generalization capability for speech recognition [C] . Xiao Xiong, Li Jinyu, Chng Eng Siong, Automatic Speech Recognition amp; Understanding, 2009. ASRU 2009 . 2009

机译：隐马尔可夫模型对语音识别的泛化能力研究
5. Robust Acoustic Modeling and Front-End Design for Distant Speech Recognition [D] . Mirsamadi, Seyedmahdad. 2017

机译：鲁棒的声学建模和远端语音识别前端设计
6. Talker-identification training using simulations of binaurally combined electric and acoustic hearing: Generalization to speech and emotion recognition [O] . Vidya Krull, Xin Luo, Karen Iler Kirk -1

机译：使用双耳结合的电声和听觉模拟模拟说话者识别训练：语音和情感识别的通用化
7. Towards Robust Indonesian Speech Recognition with Spontaneous-Speech Adapted Acoustic Models [O] . Hoesen Devin, Satriawan Cil Hardianto, Lestari Dessi Puji, 2016

机译：利用自发语音自适应声学模型实现鲁棒的印尼语音识别

A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅