CNN with Phonetic Attention for Text-Independent Speaker Verification

机译：带有语音注意的CNN，用于独立于文本的说话者验证

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text-independent speaker verification imposes no constraints on the spoken content and usually needs long observations to make reliable prediction. In this paper, we propose two speaker embedding approaches by integrating the phonetic information into the attention-based residual convolutional neural network (CNN). Phonetic features are extracted from the bottleneck layer of a pretrained acoustic model. In implicit phonetic attention (IPA), the phonetic features are projected by a transformation network into multi-channel feature maps, and then combined with the raw acoustic features as the input of the CNN network. In explicit phonetic attention (EPA), the phonetic features are directly connected to the attentive pooling layer through a separate 1-dim CNN to generate the attention weights. With the incorporation of spoken content and attention mechanism, the system can not only distill the speaker-discriminant frames but also actively normalize the phonetic variations. Multi-head attention and discriminative objectives are further studied to improve the system. Experiments on the VoxCeleb corpus show our proposed system could outperform the state-of-the-art by around 43% relative.

机译：独立于文本的说话者验证对语音内容没有任何限制，通常需要长时间观察才能做出可靠的预测。在本文中，我们通过将语音信息集成到基于注意力的残差卷积神经网络（CNN）中，提出了两种说话人嵌入方法。从预训练的声学模型的瓶颈层提取语音特征。在隐式语音注意（IPA）中，语音特征由转换网络投影到多通道特征图中，然后与原始声学特征组合作为CNN网络的输入。在显式语音注意（EPA）中，语音功能通过单独的1维CNN直接连接到注意力集中层以生成注意权重。通过结合语音内容和注意力机制，该系统不仅可以提取区分说话者的帧，而且可以主动归一化语音变化。进一步研究了多头注意力和歧视性目标，以改进系统。在VoxCeleb语料库上进行的实验表明，我们提出的系统相对于最先进的系统可达到约43％的性能。

著录项

来源
《IEEE Automatic Speech Recognition and Understanding Workshop》|2019年|718-725|共8页
会议地点
作者
Tianyan Zhou; Yong Zhao; Jinyu Li; Yifan Gong; Jian Wu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Phonetics; Training; Feature extraction; Acoustics; Task analysis; Neural networks; Speaker recognition;

机译：语音;训练;特征提取;声学;任务分析;神经网络;说话人识别;

相似文献

外文文献
中文文献
专利

1. Cross similarity measurement for speaker adaptive test normalization in text-independent speaker verification [J] . ZHAO Jian, DONG Yuan, ZHAO Xian-yu, 中国邮电高校学报（英文版） . 2008,第002期

机译：跨相似度测量，用于独立于文本的说话人验证中的说话人自适应测试标准化
2. A novel text-independent speaker verification method based on the global speaker model [J] . Yiying Zhang, Zhang D. IEEE transactions on systems, man, and cybernetics. Part A . 2000,第5期

机译：基于全局说话人模型的文本无关说话人验证方法
3. Correction to a novel text-independent speaker verification method based on the global speaker model [J] . Zhang Y., Zhu X. IEEE transactions on systems, man, and cybernetics. Part A . 2000,第6期

机译：基于全局说话人模型的新型与文本无关的说话人验证方法的校正
4. CNN with Phonetic Attention for Text-Independent Speaker Verification [C] . Tianyan Zhou, Yong Zhao, Jinyu Li, IEEE Automatic Speech Recognition and Understanding Workshop . 2019

机译：CNN具有语音关注的文本独立扬声器验证
5. Implementation and Improvement of Common Text-Independent Speaker Identification [D] . Wang, Yunlong. 2020

机译：实施和改进常见的文本无关的扬声器识别
6. Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model [O] . Ivine Kuruvila, Jan Muncke, Eghart Fischer, 2021

机译：使用联合CNN-LSTM模型从脑电图中提取对双语者场景中的听觉注意力
7. Phonetic-Attention Scoring for Deep Speaker Features in Speaker Verification [O] . Lantian Li, Zhiyuan Tang, Ying Shi, 2019

机译：扬声器验证中深音扬声器功能的语音关注评分

CNN with Phonetic Attention for Text-Independent Speaker Verification

摘要

著录项

相似文献

相关主题

期刊订阅