Improving Deep CNN Networks with Long Temporal Context for Text-Independent Speaker Verification

机译：改进具有较长时间上下文的深度CNN网络，以进行与文本无关的说话者验证

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep CNN networks have shown great success in various tasks for text-independent speaker recognition. In this paper, we explore two approaches for modeling long temporal contexts to improve the performance of the ResNet networks. The first approach is simply integrating the utterance-level mean and variance normalization into the ResNet architecture. Secondly, we combine the BLSTM and ResNet into one unified architecture. The BLSTM layers model long range, supposedly phonetically aware, context information, which could facilitate the ResNet to learn the optimal attention weight and suppress the environmental variations. The BLSTM outputs are projected into multiple-channel feature maps and fed into the ResNet network. Experiments on the VoxCeleb1 and the internal MS-SV tasks show that with attentive pooling, the proposed approaches achieve up to 23-28% relative improvement in EER over a well-trained ResNet.

机译：深入的CNN网络已经在与文本无关的说话人识别的各种任务中取得了巨大的成功。在本文中，我们探索了两种对长时态上下文进行建模的方法，以提高ResNet网络的性能。第一种方法是将话语级别的均值和方差归一化简单地集成到ResNet架构中。其次，我们将BLSTM和ResNet组合成一个统一的体系结构。 BLSTM层对远程范围（假定具有语音意识）的上下文信息进行建模，这可以帮助ResNet学习最佳注意权重并抑制环境变化。 BLSTM输出被投影到多通道特征图中，并馈入ResNet网络。在VoxCeleb1和内部MS-SV任务上进行的实验表明，通过集中的汇总，与经过良好训练的ResNet相比，所提出的方法在EER方面可实现高达23-28％的相对改善。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2020年|6834-6838|共5页
会议地点
作者
Yong Zhao; Tianyan Zhou; Zhuo Chen; Jian Wu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
speaker verification; LSTM; CNN; attentive pooling;

机译：说话人验证; LSTM; CNN;专注池;

相似文献

外文文献
中文文献
专利

1. TEXT-INDEPENDENT SPEAKER VERIFICATION USING MINIMAL RESOURCE ALLOCATION NETWORKS [J] . LI GUOJIE, P. SARATCHANDRAN, N. SUNDARARAJAN International Journal of Neural Systems . 2004,第6期

机译：使用最小资源分配网络的文本无关的说话人验证
2. Deep multi-metric learning for text-independent speaker verification [J] . Xu Jiwei, Wang Xinggang, Feng Bin, Neurocomputing . 2020,第Octa14期

机译：无关扬声器验证的深度多度量学习
3. Efficient text-independent speaker verification with structural Gaussian mixture models and neural network [J] . Bing Xiang, Berger T. IEEE Transactions on Speech and Audio Proceessing . 2003,第5期

机译：利用结构高斯混合模型和神经网络进行有效的文本无关说话者验证
4. Improving Deep CNN Networks with Long Temporal Context for Text-Independent Speaker Verification [C] . Yong Zhao, Tianyan Zhou, Zhuo Chen, IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：改进文字独立扬声器验证的长时间背景下的深层CNN网络
5. Deep Neural Network Based Speaker Verification Under Domain Mismatched Conditions [D] . Zhang, Chunlei. 2019

机译：基于深度神经网络的扬声器验证在域不匹配条件下
6. Predicting improved protein conformations with a temporal deep recurrent neural network [O] . Erik Pfeiffenberger, Paul A. Bates 2012

机译：使用时间深度递归神经网络预测改善的蛋白质构象
7. RawNet: Advanced End-to-End Deep Neural Network Using Raw Waveforms for Text-Independent Speaker Verification [O] . Jee-weon Jung, Hee-Soo Heo, Ju-ho Kim, 2019

机译：RAWENT：使用原始波形的先进端到端深神经网络用于独立于文本的扬声器验证

Improving Deep CNN Networks with Long Temporal Context for Text-Independent Speaker Verification

摘要

著录项

相似文献

相关主题

期刊订阅