End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings

机译：具有本地全球网络和鉴别扬声器嵌入的可变数量扬声器的端到端日复速

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present an end-to-end deep network model that performs meeting diarization from single-channel audio recordings. End-to-end diarization models have the advantage of handling speaker overlap and enabling straightforward handling of discriminative training, unlike traditional clustering-based diarization methods. The proposed system is designed to handle meetings with unknown numbers of speakers, using variable-number permutation-invariant cross-entropy based loss functions. We introduce several components that appear to help with diarization performance, including a local convolutional network followed by a global self-attention module, multitask transfer learning using a speaker identification component, and a sequential approach where the model is refined with a second stage. These are trained and validated on simulated meeting data based on LibriSpeech and LibriTTS datasets; final evaluations are done using LibriCSS, which consists of simulated meetings recorded using real acoustics via loudspeaker playback. The proposed model performs better than previously proposed end-to-end diarization models on these data.

机译：我们提出了一种端到端的深度网络模型，从单通道音频录制执行会议日期。与传统的基于聚类的血液化方法不同，端到端的日复日复速动模型具有处理扬声器重叠并使鉴别性培训的直接处理。该建议的系统旨在处理与未知数量的扬声器数量的会议，使用基于可变数置换不变的跨熵的丢失函数。我们介绍了几种似乎有助于深度化性能的组件，包括本地卷积网络，然后是全球自我注意模块，使用扬声器识别组件的多任务传输学习，以及模型用第二阶段精制模型的顺序方法。这些培训并在基于LibrisPeech和库数据集的模拟会议数据上验证并验证;使用Librics完成最终评估，该天秤座由使用扬声器播放使用真实声学记录的模拟会议。所提出的模型比以前提出的结束于最终的晚期日复速度模型更好。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2021年|7183-7187|共5页
会议地点
作者
Soumi Maiti; Hakan Erdogan; Kevin Wilson; Scott Wisdom; Shinji Watanabe; John R. Hershey;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Loudspeakers; Convolution; Transfer learning; Data models; Acoustics; Robustness;

机译：训练;扬声器;卷积;转移学习;数据模型;声学;鲁棒性;

相似文献

外文文献
中文文献
专利

1. Unsupervised deep feature embeddings for speaker diarization [J] . Rehan AHMAD, Syed ZUBAIR Turkish Journal of Electrical Engineering and Computer Sciences . 2019,第4期

机译：扬声器日益改估无监督的深度特征嵌入
2. Speaker diarization using autoassociative neural networks [J] . S. Jothilakshmi, V. Ramalingam, S. Palanivel Engineering Applications of Artificial Intelligence . 2009,第4a5期

机译：使用自联想神经网络进行说话人区分
3. State-of-the-art speaker recognition with neural network embeddings in NIST SRE18 and Speakers in the Wild evaluations [J] . Jesus Villalba, Nanxin Chen, David Snyder, Computer speech and language . 2020,第Mara期

机译：NIST SRE18中具有神经网络嵌入功能的最先进的说话人识别功能，Wild评估中的说话人功能
4. BW-EDA-EEND: streaming END-TO-END Neural Speaker Diarization for a Variable Number of Speakers [C] . Eunjung Han, Chul Lee, Andreas Stolcke IEEE International Conference on Acoustics, Speech and Signal Processing . 2021

机译：BW-EDA-eend：流媒体端到端的神经扬声器深度变量数量的扬声器
5. Discriminative and generative approaches for long- and short-term speaker characteristics modeling: Application to speaker verification. [D] . Dehak, Najim. 2009

机译：长期和短期说话者特征建模的判别和生成方法：在说话者验证中的应用。
6. Supervised Speaker Diarization Using Random Forests: A Tool for Psychotherapy Process Research [O] . Lukas Fürer, Nathalie Schenk, Volker Roth, 2020

机译：使用随机森林监督扬声器日期：一种心理治疗过程研究的工具
7. Speaker Diarization using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings [O] . Cyrta, Pawel, Trzciński, Tomasz, Stokowiec, Wojciech 2017

机译：使用深度递归卷积神经网络的扬声器二值化用于扬声器嵌入

End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings

摘要

著录项

相似文献

相关主题

期刊订阅