首页> 外文会议>IEEE Automatic Speech Recognition and Understanding Workshop >CNN with Phonetic Attention for Text-Independent Speaker Verification
【24h】

CNN with Phonetic Attention for Text-Independent Speaker Verification

机译:带有语音注意的CNN,用于独立于文本的说话者验证

获取原文

摘要

Text-independent speaker verification imposes no constraints on the spoken content and usually needs long observations to make reliable prediction. In this paper, we propose two speaker embedding approaches by integrating the phonetic information into the attention-based residual convolutional neural network (CNN). Phonetic features are extracted from the bottleneck layer of a pretrained acoustic model. In implicit phonetic attention (IPA), the phonetic features are projected by a transformation network into multi-channel feature maps, and then combined with the raw acoustic features as the input of the CNN network. In explicit phonetic attention (EPA), the phonetic features are directly connected to the attentive pooling layer through a separate 1-dim CNN to generate the attention weights. With the incorporation of spoken content and attention mechanism, the system can not only distill the speaker-discriminant frames but also actively normalize the phonetic variations. Multi-head attention and discriminative objectives are further studied to improve the system. Experiments on the VoxCeleb corpus show our proposed system could outperform the state-of-the-art by around 43% relative.
机译:独立于文本的说话者验证对语音内容没有任何限制,通常需要长时间观察才能做出可靠的预测。在本文中,我们通过将语音信息集成到基于注意力的残差卷积神经网络(CNN)中,提出了两种说话人嵌入方法。从预训练的声学模型的瓶颈层提取语音特征。在隐式语音注意(IPA)中,语音特征由转换网络投影到多通道特征图中,然后与原始声学特征组合作为CNN网络的输入。在显式语音注意(EPA)中,语音功能通过单独的1维CNN直接连接到注意力集中层以生成注意权重。通过结合语音内容和注意力机制,该系统不仅可以提取区分说话者的帧,而且可以主动归一化语音变化。进一步研究了多头注意力和歧视性目标,以改进系统。在VoxCeleb语料库上进行的实验表明,我们提出的系统相对于最先进的系统可达到约43%的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号