首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >I-Vectors and Structured Neural Networks for Rapid Adaptation of Acoustic Models
【24h】

I-Vectors and Structured Neural Networks for Rapid Adaptation of Acoustic Models

机译:I-向量和结构化神经网络可快速适应声学模型

获取原文
获取原文并翻译 | 示例

摘要

A lot of interest has been risen in the last years on the adaptation of deep neural network (DNN) acoustic models, as the latter become the state-of-art in automatic speech recognition. This work focuses on approaches that allow for rapid and robust adaptation of such models. First, i-vectors are added to the DNN input as speaker-informed features. An informative prior is introduced to i-vector estimation to improve the robustness to limited adaptation data. I-vectors are then combined with a structured adaptive DNN, the multibasis adaptive neural network (MBANN), and the complementarity of these adaptation techniques is investigated. Moreover, i-vectors are used to predict the MBANN transforms, avoiding the initial decoding pass and alignment. These approaches are evaluated on a U.S. English Broadcast News (BN) transcription task with two distinct sets of test data. The first, from the BN task and BN-style Youtube videos, yields test data acoustically matched to the training data, while the second set is from acoustically mismatched Youtube videos of diverse context. The performance gains from these schemes are found to be sensitive to the level of mismatch between training and test sets. The MBANN system combined with i-vector input achieves best performance for BN test sets. The i-vector-based predictive MBANN scheme is proven to be more robust to acoustically mismatched conditions and outperforms the other adaptation schemes in such scenarios.
机译:近年来,随着深度神经网络(DNN)声学模型的改编成为最新的自动语音识别技术,引起了人们极大的兴趣。这项工作的重点是允许对此类模型进行快速而强大的适应的方法。首先,将i向量作为说话者通知的功能添加到DNN输入中。信息量先验被引入到i向量估计中,以提高对有限适应数据的鲁棒性。然后,将I向量与结构化自适应DNN,多层自适应神经网络(MBANN)组合,并研究这些自适应技术的互补性。此外,i向量用于预测MBANN变换,从而避免了初始解码过程和对齐。这些方法是在美国英语广播新闻(BN)转录任务上使用两组不同的测试数据进行评估的。第一组来自BN任务和BN风格的Youtube视频,产生的声音与训练数据声学匹配的测试数据,而第二组来自听觉失配的不同背景的Youtube视频。发现从这些方案获得的性能增益对​​训练和测试集之间的不匹配程度很敏感。结合了i向量输入的MBANN系统为BN测试集提供了最佳性能。事实证明,基于i向量的预测MBANN方案在声学上不匹配的情况下更健壮,并且在此类情况下优于其他自适应方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号