Utterance-level Aggregation for Speaker Recognition in the Wild

机译：在野外说话人识别的话语级聚合

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The objective of this paper is speaker recognition `in the wild' - where utterances may be of variable length and also contain irrelevant signals. Crucial elements in the design of deep networks for this task are the type of trunk (frame level) network, and the method of temporal aggregation. We propose a powerful speaker recognition deep network, using a `thin-ResNet' trunk architecture, and a dictionary-based NetVLAD or GhostVLAD layer to aggregate features across time, that can be trained end-to-end. We show that our network achieves state of the art performance by a significant margin on the VoxCeleb1 test set for speaker recognition, whilst requiring fewer parameters than previous methods. We also investigate the effect of utterance length on performance, and conclude that for `in the wild' data, a longer length is beneficial.

机译：本文的目的是扬声器识别`在野外' - 话语可能是可变长度的，也包含不相关的信号。对于此任务的深度网络设计中的重要元素是中继（帧级别）网络的类型，以及时间聚合方法。我们提出了一个强大的扬声器识别深网络，使用“Then-Reset”中继架构，以及基于字典的NetVlad或Ghostvlad层，以聚合在时间的聚合特征，可以训练结束到底。我们表明，我们的网络通过对扬声器识别的VoxceleB1测试设置的重大边缘实现了最重要的余量，同时需要比以前的方法更少的参数。我们还研究了话语长度对性能的影响，并得出结论，对于野外数据，更长的长度是有益的。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2019年|5791-5795|共5页
会议地点
作者
Weidi Xie; Arsha Nagrani; Joon Son Chung; Andrew Zisserman;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
learning (artificial intelligence); neural nets; speaker recognition;

机译：学习（人工智能）;神经网络;说话者识别;

相似文献

外文文献
中文文献
专利

1. State-of-the-art speaker recognition with neural network embeddings in NIST SRE18 and Speakers in the Wild evaluations [J] . Jesus Villalba, Nanxin Chen, David Snyder, Computer speech and language . 2020,第Mara期

机译：NIST SRE18中具有神经网络嵌入功能的最先进的说话人识别功能，Wild评估中的说话人功能
2. Robust distant speaker recognition based on position-dependent CMN by combining speaker-specific GMM with speaker-adapted HMM [J] . Longbiao Wang, Norihide Kitaoka, Seiichi Nakagawa Speech Communication . 2007,第6期

机译：通过结合特定于说话人的GMM和适用于说话人的HMM，基于位置相关的CMN进行鲁棒的远方说话人识别
3. Text-Independent/Text-Prompted Speaker Recognition by Combining Speaker-Specific GMM with Speaker Adapted Syllable-Based HMM [J] . Seiichi NAKAGAWA, Wei ZHANG, Mitsuo TAKAHASHI IEICE Transactions on Information and Systems . 2006,第3期

机译：通过结合特定于说话人的GMM和基于说话人的基于音节的HMM来实现与文本无关/提示文字的说话人识别
4. Utterance-level Aggregation for Speaker Recognition in the Wild [C] . Weidi Xie, Arsha Nagrani, Joon Son Chung, IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：野生扬声器识别的话语级聚集
5. Finding Difficult Speakers in Automatic Speaker Recognition [D] . Stoll, Lara Lynn 2011

机译：在自动说话人识别中寻找困难的说话人
6. Revisiting vocal perception in non-human animals: a review of vowel discrimination speaker voice recognition and speaker normalization [O] . Buddhamas Kriengwatana, Paola Escudero, Carel ten Cate 2014

机译：重温非人类动物的声音感知：元音辨别说话人语音识别和说话人正常化的综述
7. Utterance-level Aggregation for Speaker Recognition in the Wild [O] . Weidi Xie, Arsha Nagrani, Joon Son Chung, 2019

机译：野生扬声器识别的话语级聚集
8. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. [R] . Hansen, J. H. 2015

机译：强大的语音处理和识别：说话者ID，语言ID，语音识别/关键字识别，Diarization / Co-Channel /环境表征，说话者状态评估。

Utterance-level Aggregation for Speaker Recognition in the Wild

摘要

著录项

相似文献

相关主题

期刊订阅