Improving Deep CNN Networks with Long Temporal Context for Text-Independent Speaker Verification

机译：改进文字独立扬声器验证的长时间背景下的深层CNN网络

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep CNN networks have shown great success in various tasks for text-independent speaker recognition. In this paper, we explore two approaches for modeling long temporal contexts to improve the performance of the ResNet networks. The first approach is simply integrating the utterance-level mean and variance normalization into the ResNet architecture. Secondly, we combine the BLSTM and ResNet into one unified architecture. The BLSTM layers model long range, supposedly phonetically aware, context information, which could facilitate the ResNet to learn the optimal attention weight and suppress the environmental variations. The BLSTM outputs are projected into multiple-channel feature maps and fed into the ResNet network. Experiments on the VoxCeleb1 and the internal MS-SV tasks show that with attentive pooling, the proposed approaches achieve up to 23-28% relative improvement in EER over a well-trained ResNet.

机译：深度CNN网络在独立于文本的扬声器识别方面表现出巨大的成功。在本文中，我们探索了两个用于建模长时间上下文的方法，以提高Reset网络的性能。第一种方法只是将话语级别平均值和方差标准化集成到Reset架构中。其次，我们将BLSTM和RESNET结合成一个统一的架构。 BLSTM层模型长距离，据说语音识别，上下文信息，可以促进reset学习最佳注意力并抑制环境变化。将BLSTM输出投影到多通道特征映射中，并进入Reset网络。 VOXECEB1的实验和内部MS-SV任务表明，随着细心的汇集，拟议的方法在训练有素的RESET上达到了高达23-28％的相对改善。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2020年|p6824-7443|共5页
会议地点
作者
Yong Zhao; Tianyan Zhou; Zhuo Chen; Jian Wu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN912-53;
关键词
speaker verification; LSTM; CNN; attentive pooling;

机译：扬声器验证;LSTM;CNN;细心汇集;

相似文献

外文文献
中文文献
专利

1. TEXT-INDEPENDENT SPEAKER VERIFICATION USING MINIMAL RESOURCE ALLOCATION NETWORKS [J] . LI GUOJIE, P. SARATCHANDRAN, N. SUNDARARAJAN International Journal of Neural Systems . 2004,第6期

机译：使用最小资源分配网络的文本无关的说话人验证
2. Deep multi-metric learning for text-independent speaker verification [J] . Xu Jiwei, Wang Xinggang, Feng Bin, Neurocomputing . 2020,第Octa14期

机译：无关扬声器验证的深度多度量学习
3. Efficient text-independent speaker verification with structural Gaussian mixture models and neural network [J] . Bing Xiang, Berger T. IEEE Transactions on Speech and Audio Proceessing . 2003,第5期

机译：利用结构高斯混合模型和神经网络进行有效的文本无关说话者验证
4. Improving Deep CNN Networks with Long Temporal Context for Text-Independent Speaker Verification [C] . Yong Zhao, Tianyan Zhou, Zhuo Chen, IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：改进具有较长时间上下文的深度CNN网络，以进行与文本无关的说话者验证
5. Deep Neural Network Based Speaker Verification Under Domain Mismatched Conditions [D] . Zhang, Chunlei. 2019

机译：基于深度神经网络的扬声器验证在域不匹配条件下
6. Predicting improved protein conformations with a temporal deep recurrent neural network [O] . Erik Pfeiffenberger, Paul A. Bates 2012

机译：使用时间深度递归神经网络预测改善的蛋白质构象
7. RawNet: Advanced End-to-End Deep Neural Network Using Raw Waveforms for Text-Independent Speaker Verification [O] . Jee-weon Jung, Hee-Soo Heo, Ju-ho Kim, 2019

机译：RAWENT：使用原始波形的先进端到端深神经网络用于独立于文本的扬声器验证

Improving Deep CNN Networks with Long Temporal Context for Text-Independent Speaker Verification

摘要

著录项

相似文献

相关主题

期刊订阅