首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition
【24h】

A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition

机译:鲁棒语音识别声学模型的泛化能力研究

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we explore the generalization capability of acoustic model for improving speech recognition robustness against noise distortions. While generalization in statistical learning theory originally refers to the model's ability to generalize well on unseen testing data drawn from the same distribution as that of the training data, we show that good generalization capability is also desirable for mismatched cases. One way to obtain such general models is to use margin-based model training method, e.g., soft-margin estimation (SME), to enable some tolerance to acoustic mismatches without a detailed knowledge about the distortion mechanisms through enhancing margins between competing models. Experimental results on the Aurora-2 and Aurora-3 connected digit string recognition tasks demonstrate that, by improving the model's generalization capability through SME training, speech recognition performance can be significantly improved in both matched and low to medium mismatched testing cases with no language model constraints. Recognition results show that SME indeed performs better with than without mean and variance normalization, and therefore provides a complimentary benefit to conventional feature normalization techniques such that they can be combined to further improve the system performance. Although this study is focused on noisy speech recognition, we believe the proposed margin-based learning framework can be extended to dealing with different types of distortions and robustness issues in other machine learning applications.
机译:在本文中,我们探索声学模型的泛化能力,以提高语音识别对噪声失真的鲁棒性。虽然统计学习理论中的泛化最初指的是模型在与训练数据分布相同的分布中得出的看不见的测试数据上很好地泛化的能力,但我们证明了良好的泛化能力对于不匹配的情况也很理想。获得这种通用模型的一种方法是使用基于余量的模型训练方法,例如软余量估计(SME),以在不详细了解失真机制的情况下通过增强竞争模型之间的余量来实现对声音失配的一定容忍度。对Aurora-2和Aurora-3连接的数字字符串识别任务的实验结果表明,通过在SME培训中提高模型的泛化能力,在没有语言模型的匹配和中低匹配测试案例中,语音识别性能都可以得到显着提高。约束。识别结果表明,在没有均值和方差归一化的情况下,SME确实确实表现更好,因此,与常规特征归一化技术相比,SME具有互补的优势,因此可以将它们组合起来以进一步改善系统性能。尽管此研究的重点是嘈杂的语音识别,但我们认为,建议的基于余量的学习框架可以扩展到在其他机器学习应用程序中处理不同类型的失真和鲁棒性问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号