An investigation of augmenting speaker representations to improve speaker normalisation for DNN-based speech recognition

机译：增强扬声器表示，以改善基于DNN的语音识别扬声器归一化的增强扬声器表示

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The conventional short-term interval features used by the Deep Neural Networks (DNNs) lack the ability to learn longer term information. This poses a challenge for training a speaker-independent (SI) DNN since the short-term features do not provide sufficient information for the DNN to estimate the real robust factors of speaker-level variations. The key to this problem is to obtain a sufficiently robust and informative speaker representation. This paper compares several speaker representations. Firstly, a DNN speaker classifier is used to extract the bottleneck features as the speaker representation, called the Bottleneck Speaker Vector (BSV). To further improve the robustness of this representation, a first-order Bottleneck Speaker Super Vector (BSSV) is also proposed, where the BSV is expanded into a super vector space by incorporating the phoneme posterior probabilities. Finally, a more fine-grain speaker representation based on the FMLLR-shifted features is examined. The experimental results on the WSJ0 and WSJ1 datasets show that the proposed speaker representations are useful in normalising the speaker effects for robust DNN-based automatic speech recognition. The best performance is achieved by augmenting both the BSSV and the FMLLR-shifted representations, yielding 10.0% - 15.3% relatively performance gains over the SI DNN baseline.

机译：传统的短期区间使用的功能由深层神经网络（DNNs）缺乏了解更长期的信息的能力。这给训练说话者无关（SI）DNN由于短期特征没有提供足够的信息，为DNN估计的扬声器电平变化的真正强大的因素是一个挑战。这个问题的关键是要获得足够强大和翔实的扬声器表现。本文比较了几种喇叭表示。首先，DNN扬声器分类器被用来提取瓶颈特征作为扬声器表示，称之为瓶颈扬声器向量（BSV）。为了进一步提高这种表示的鲁棒性，一阶瓶颈扬声器超向量（BSSV）还提出，其中BSV是通过合并音素后验概率扩展成一个超级向量空间。最后，检查基于所述FMLLR移功能的更细粒度扬声器表示。在WSJ0和WSJ1数据集上的实验结果表明，该扬声器表示是用于正常化强大的基于DNN自动语音识别说话者的影响非常有用。在SI DNN基线15.3％，相对性能提升 - 最佳性能是通过扩大双方的BSSV和FMLLR移交涉，得到10.0％实现。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2015年||共4页
会议地点
作者
H. Huang; K. C. Sim;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN912-53;
关键词
augmented speaker representation; deep neural network; speaker normalisation; speech recognition;

机译：增强扬声器表示;深神经网络;扬声器标准化;语音识别;

相似文献

外文文献
中文文献
专利

1. An acoustic-phonetic-based speaker adaptation technique for improving speaker-independent continuous speech recognition [J] . Yunxin Zhao IEEE Transactions on Speech and Audio Proceeding . 1994,第3期

机译：基于声学的说话人自适应技术，用于改善与说话人无关的连续语音识别
2. Restricted Boltzmann machines for vector representation of speech in speaker recognition [J] . Omid Ghahabi, Javier Hernando Computer speech and language . 2018,第JANa期

机译：说话人识别中用于语音矢量表示的受限玻尔兹曼机
3. Session compensation using binary speech representation for speaker recognition [J] . Gabriel Hernandez-Sierra, Jose R. Calvo, Jean-Francois Bonastre, Pattern recognition letters . 2014,第nova1期

机译：使用二进制语音表示进行会话补偿以进行说话人识别
4. An investigation of augmenting speaker representations to improve speaker normalisation for DNN-based speech recognition [C] . Huang Hengguan, Sim Khe Chai IEEE International Conference on Acoustics, Speech and Signal Processing . 2015

机译：为基于DNN的语音识别增强说话者表示以提高说话者归一化的研究
5. Neural Network Based Representation Learning and Modeling for Speech and Speaker Recognition [D] . Guo, Jinxi. 2019

机译：基于神经网络的语言和扬声器识别的模拟
6. Selective cortical representation of attended speaker in multi-talker speech perception [O] . Nima Mesgarani, Edward F. Chang -1

机译：在多讲话者语音感知出席扬声器的选择性皮层代表
7. A study of LSF representation for speaker dependent and speaker independent HMM-based speech recognition systems [O] . K. K. Paliwal 1990

机译：基于说话人和与说话人无关的基于HMM的语音识别系统的LSF表示研究

An investigation of augmenting speaker representations to improve speaker normalisation for DNN-based speech recognition

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅