首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Improving Deep CNN Networks with Long Temporal Context for Text-Independent Speaker Verification
【24h】

Improving Deep CNN Networks with Long Temporal Context for Text-Independent Speaker Verification

机译:改进文字独立扬声器验证的长时间背景下的深层CNN网络

获取原文

摘要

Deep CNN networks have shown great success in various tasks for text-independent speaker recognition. In this paper, we explore two approaches for modeling long temporal contexts to improve the performance of the ResNet networks. The first approach is simply integrating the utterance-level mean and variance normalization into the ResNet architecture. Secondly, we combine the BLSTM and ResNet into one unified architecture. The BLSTM layers model long range, supposedly phonetically aware, context information, which could facilitate the ResNet to learn the optimal attention weight and suppress the environmental variations. The BLSTM outputs are projected into multiple-channel feature maps and fed into the ResNet network. Experiments on the VoxCeleb1 and the internal MS-SV tasks show that with attentive pooling, the proposed approaches achieve up to 23-28% relative improvement in EER over a well-trained ResNet.
机译:深度CNN网络在独立于文本的扬声器识别方面表现出巨大的成功。 在本文中,我们探索了两个用于建模长时间上下文的方法,以提高Reset网络的性能。 第一种方法只是将话语级别平均值和方差标准化集成到Reset架构中。 其次,我们将BLSTM和RESNET结合成一个统一的架构。 BLSTM层模型长距离,据说语音识别,上下文信息,可以促进reset学习最佳注意力并抑制环境变化。 将BLSTM输出投影到多通道特征映射中,并进入Reset网络。 VOXECEB1的实验和内部MS-SV任务表明,随着细心的汇集,拟议的方法在训练有素的RESET上达到了高达23-28%的相对改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号