首页> 外文期刊>International journal of speech technology >Robust features for multilingual acoustic modeling
【24h】

Robust features for multilingual acoustic modeling

机译:多语言声学建模的强大功能

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we propose a technique to derive robust features for multilingual acoustic modeling using hidden Markov model-Gaussian mixture models (HMM-GMM). We achieve this by discriminatively combining the phonetic contexts of the target languages (languages in the multilingual system). Phonetic context is captured using wide temporal context of the features, and the dimensionality of the resulting feature set is reduced to suit the HMM-GMM implementation using a neural network with a bottleneck in one of the hidden layers. The output before the non-linearity at the bottle-neck layer of the neural network is the new feature. Since the features are optimized for the target languages in the multilingual recognizer, they are referred to as Target Languages Oriented Features (TLOF). We perform our experiments for two of the most widely spoken Indian languages, Hindi and Tamil. TLOF offers significant performance improvements over both monolingual and multilingual phone recognizers using Mel frequency cepstral coefficients (MFCC). This emphasizes that TLOF can help share data across languages. It was also seen that TLOF can enhance the performance of monolingual acoustic models, compared to systems using MFCC.
机译:在本文中,我们提出了一种使用隐马尔可夫模型-高斯混合模型(HMM-GMM)导出多语言声学建模鲁棒特征的技术。我们通过有区别地组合目标语言(多语言系统中的语言)的语音上下文来实现这一目标。使用特征的宽时态上下文来捕获语音上下文,并且使用隐藏层之一中具有瓶颈的神经网络来降低所得特征集的维数,以适合HMM-GMM实现。神经网络瓶颈层之前的非线性输出是新功能。由于功能是针对多语言识别器中的目标语言而优化的,因此它们被称为面向目标语言的功能(TLOF)。我们针对印度两种最广泛使用的语言(印地语和泰米尔语)进行实验。 TLOF使用梅尔频率倒谱系数(MFCC)相对于单语言和多语言电话识别器均提供了显着的性能改进。这强调了TLOF可以帮助跨语言共享数据。还可以看到,与使用MFCC的系统相比,TLOF可以增强单语言声学模型的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号