Attention Mechanism in Speaker Recognition: What Does it Learn in Deep Speaker Embedding?

机译：说话人识别中的注意力机制：在深度说话人嵌入中学到什么？

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents an experimental study on deep speaker embedding with an attention mechanism that has been found to be a powerful representation learning technique in speaker recognition. In this framework, an attention model works as a frame selector that computes an attention weight for each frame-level feature vector, in accord with which an utterance-level representation is produced at the pooling layer in a speaker embedding network. In general, an attention model is trained together with the speaker embedding network on a single objective function, and thus those two components are tightly bound to one another. In this paper, we consider the possibility that the attention model might be decoupled from its parent network and assist other speaker embedding networks and even conventional i-vector extractors. This possibility is demonstrated through a series of experiments on a NIST Speaker Recognition Evaluation (SRE) task, with 9.0% EER reduction and 3.8% minC_primary reduction when the attention weights are applied to i-vector extraction. Another experiment shows that DNN-based soft voice activity detection (VAD) can be effectively combined with the attention mechanism to yield further reduction of minC_primary by 6.6% and 1.6% in deep speaker embedding and i-vector systems, respectively.

机译：本文提出了一种基于深层说话人嵌入的实验研究，该机制具有注意力机制，已被发现是说话人识别中一种强大的表示学习技术。在此框架中，注意力模型用作帧选择器，为每个帧级特征向量计算注意力权重，据此，说话人嵌入网络中的汇聚层将产生话语级表示。通常，注意力模型与说话人嵌入网络一起在单个目标函数上进行训练，因此，这两个组件彼此紧密地绑定在一起。在本文中，我们考虑了注意力模型可能与其父网络分离的可能性，并可以帮助其他说话者嵌入网络，甚至传统的i-vector提取器。通过NIST说话者识别评估（SRE）任务的一系列实验证明了这种可能性，EER降低9.0 \\％，minC降低3.8 \\％_{主要 \ n减少注意权重应用于i-vector萃取。另一个实验表明，基于DNN的软语音活动检测（VAD）可以与注意力机制有效结合，从而进一步降低minC \ n _{主要的 \ n在深层发言人嵌入和i中分别降低了6.6 \\％和1.6 \\％ -vector系统。}}

著录项

来源
《2018 IEEE Spoken Language Technology Workshop》|2018年|1052-1059|共8页
会议地点 Athens(GR)
作者
Qiongqiong Wang; Koji Okabe; Kong Aik Lee; Hitoshi Yamamoto; Takafumi Koshinaka;
展开▼
作者单位

Biometrics Research Laboratories, NEC Corporation, Japan;

Biometrics Research Laboratories, NEC Corporation, Japan;

Biometrics Research Laboratories, NEC Corporation, Japan;

Biometrics Research Laboratories, NEC Corporation, Japan;

Biometrics Research Laboratories, NEC Corporation, Japan;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Feature extraction; Speaker recognition; Speech recognition; Neural networks; Acoustics; Standards; Computational modeling;

机译：特征提取;说话人识别;语音识别;神经网络;声学;标准;计算模型;;

相似文献

外文文献
中文文献
专利

1. State-of-the-art speaker recognition with neural network embeddings in NIST SRE18 and Speakers in the Wild evaluations [J] . Jesus Villalba, Nanxin Chen, David Snyder, Computer speech and language . 2020,第Mara期

机译：NIST SRE18中具有神经网络嵌入功能的最先进的说话人识别功能，Wild评估中的说话人功能
2. TEnet: target speaker extraction network with accumulated speaker embedding for automatic speech recognition [J] . Li Wenjie, Zhang Pengyuan, Yan Yonghong Electronics Letters . 2019,第14期

机译：TEnet：目标说话人提取网络，具有累积的说话人嵌入功能，可自动识别语音
3. Self-attention based speaker recognition using Cluster-Range Loss [J] . Bian Tengyue, Chen Fangzhou, Xu Li Neurocomputing . 2019,第Nova27期

机译：使用聚类距离损失的基于自我注意的说话人识别
4. Attention Mechanism in Speaker Recognition: What Does it Learn in Deep Speaker Embedding? [C] . Qiongqiong Wang, Koji Okabe, Kong Aik Lee, Spoken Language Technology Workshop . 2018

机译：扬声器识别中的注意机制：在嵌入的深入扬声器中学习了什么？
5. Specificity of the b Test, Dot Counting Test, Rey 15-Item Test Plus Recognition, and Rey Word Recognition Test in Monolingual Spanish Speakers Embedded Measure of Effort [D] . Robles, Luz Alehida 2013

机译：b语言测试，点计数测试，Rey 15项测试加识别和Rey单词识别测试在说西班牙语的嵌入式工作量中的特异性
6. Adversarially Learned Total Variability Embedding for Speaker Recognition with Random Digit Strings [O] . Woo Hyun Kang, Nam Soo Kim 2019

机译：对抗性学习的总可变性嵌入用于随机数字字符串的说话人识别
7. Speaker-Aware Training of Attention-Based End-to-End Speech Recognition Using Neural Speaker Embeddings [O] . Aku Rouhe, Tuomas Kaseva, Mikko Kurimo 2020

机译：使用神经扬声器嵌入的扬声器感知注意力的关注结束语音识别
8. Investigation of Speaker-Independent Word Recognition Using Multiple Features, Decision Mechanisms, and Template Sets [R] . Brusuelas, M. A. 1986

机译：使用多个特征，决策机制和模板集研究与说话者无关的单词识别

Attention Mechanism in Speaker Recognition: What Does it Learn in Deep Speaker Embedding?

摘要

著录项

相似文献

相关主题

期刊订阅