首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Feature Compensation Techniques for ASR on Band-Limited Speech
【24h】

Feature Compensation Techniques for ASR on Band-Limited Speech

机译:带限语音的ASR特征补偿技术

获取原文
获取原文并翻译 | 示例

摘要

Band-limited speech (speech for which parts of the spectrum are completely lost) is a major cause for accuracy degradation of automatic speech recognition (ASR) systems particularly when acoustic models have been trained with data with a different spectral range. In this paper, we present an extensive study of the problem of ASR of band-limited speech with full-bandwidth acoustic models. Our focus is mainly on band-limited feature compensation, covering even the case of time-varying band-limiting distortions, but we also compare this approach to more common model-side techniques (adaptation and retraining) and explore the combination of feature-based and model-side approaches. The feature compensation algorithms proposed are organized in a unified framework supported by a novel mathematical model of the impact of such distortions on Mel-frequency cepstral coefficient (MFCC) features. A crucial and novel contribution is the analysis made of the relative correlation of different elements in the MFCC feature vector for the cases of full-bandwidth and limited-bandwidth speech, which justifies an important modification in the feature compensation scheme. Furthermore, an intensive experimental analysis is provided. Experiments are conducted on real telephone channels, as well as artificial low-pass and bandpass filters applied over TIMIT data, and results are given for different experimental constraints and variations of the feature compensation method. Results for other well-known robustness approaches, such as cepstral mean normalization (CMN), model retraining, and model adaptation are also given for comparison. ASR performance with our approach is similar or even better than model adaptation, and we argue that in particular cases such as rapidly varying distortions, or limited computational or memory resources, feature compensation is more convenient. Furthermore, we show that feature-side and model-side approaches may be combined, outperforming any of those approache-n-ns alone.
机译:频带受限的语音(部分频谱完全丢失的语音)是自动语音识别(ASR)系统精度下降的主要原因,尤其是当声学模型已经使用不同频谱范围的数据进行训练时。在本文中,我们对具有全带宽声学模型的带限语音的ASR问题进行了广泛的研究。我们的重点主要放在带限特征补偿上,甚至涵盖了随时间变化的带限失真情况,但我们还将这种方法与更常见的模型端技术(自适应和再训练)进行了比较,并探索了基于特征的组合和模型方面的方法。提出的特征补偿算法在统一的框架中进行组织,并由新颖的数学模型支持,这些模型对梅尔频率倒谱系数(MFCC)特征产生了影响。对全带宽和有限带宽语音情况下的MFCC特征向量中不同元素的相对相关性进行分析,是一项至关重要的新颖贡献,这证明了对特征补偿方案进行重要修改的合理性。此外,提供了深入的实验分析。在真实的电话信道上进行了实验,并对TIMIT数据应用了人工的低通和带通滤波器,并针对不同的实验约束和特征补偿方法的变化给出了结果。还提供了其他众所周知的鲁棒性方法的结果,例如倒谱均值归一化(CMN),模型再训练和模型自适应。我们的方法的ASR性能与模型适应性相似甚至更好,并且我们认为在特定情况下,例如快速变化的失真或有限的计算或内存资源,特征补偿更加方便。此外,我们表明,可以结合使用特征方方法和模型方方法,胜过任何单独的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号