首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >An investigation of augmenting speaker representations to improve speaker normalisation for DNN-based speech recognition
【24h】

An investigation of augmenting speaker representations to improve speaker normalisation for DNN-based speech recognition

机译:增强扬声器表示,以改善基于DNN的语音识别扬声器归一化的增强扬声器表示

获取原文

摘要

The conventional short-term interval features used by the Deep Neural Networks (DNNs) lack the ability to learn longer term information. This poses a challenge for training a speaker-independent (SI) DNN since the short-term features do not provide sufficient information for the DNN to estimate the real robust factors of speaker-level variations. The key to this problem is to obtain a sufficiently robust and informative speaker representation. This paper compares several speaker representations. Firstly, a DNN speaker classifier is used to extract the bottleneck features as the speaker representation, called the Bottleneck Speaker Vector (BSV). To further improve the robustness of this representation, a first-order Bottleneck Speaker Super Vector (BSSV) is also proposed, where the BSV is expanded into a super vector space by incorporating the phoneme posterior probabilities. Finally, a more fine-grain speaker representation based on the FMLLR-shifted features is examined. The experimental results on the WSJ0 and WSJ1 datasets show that the proposed speaker representations are useful in normalising the speaker effects for robust DNN-based automatic speech recognition. The best performance is achieved by augmenting both the BSSV and the FMLLR-shifted representations, yielding 10.0% - 15.3% relatively performance gains over the SI DNN baseline.
机译:传统的短期区间使用的功能由深层神经网络(DNNs)缺乏了解更长期的信息的能力。这给训练说话者无关(SI)DNN由于短期特征没有提供足够的信息,为DNN估计的扬声器电平变化的真正强大的因素是一个挑战。这个问题的关键是要获得足够强大和翔实的扬声器表现。本文比较了几种喇叭表示。首先,DNN扬声器分类器被用来提取瓶颈特征作为扬声器表示,称之为瓶颈扬声器向量(BSV)。为了进一步提高这种表示的鲁棒性,一阶瓶颈扬声器超向量(BSSV)还提出,其中BSV是通过合并音素后验概率扩展成一个超级向量空间。最后,检查基于所述FMLLR移功能的更细粒度扬声器表示。在WSJ0和WSJ1数据集上的实验结果表明,该扬声器表示是用于正常化强大的基于DNN自动语音识别说话者的影响非常有用。在SI DNN基线15.3%,相对性能提升 - 最佳性能是通过扩大双方的BSSV和FMLLR移交涉,得到10.0%实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号