首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings
【24h】

End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings

机译:具有本地全球网络和鉴别扬声器嵌入的可变数量扬声器的端到端日复速

获取原文

摘要

We present an end-to-end deep network model that performs meeting diarization from single-channel audio recordings. End-to-end diarization models have the advantage of handling speaker overlap and enabling straightforward handling of discriminative training, unlike traditional clustering-based diarization methods. The proposed system is designed to handle meetings with unknown numbers of speakers, using variable-number permutation-invariant cross-entropy based loss functions. We introduce several components that appear to help with diarization performance, including a local convolutional network followed by a global self-attention module, multitask transfer learning using a speaker identification component, and a sequential approach where the model is refined with a second stage. These are trained and validated on simulated meeting data based on LibriSpeech and LibriTTS datasets; final evaluations are done using LibriCSS, which consists of simulated meetings recorded using real acoustics via loudspeaker playback. The proposed model performs better than previously proposed end-to-end diarization models on these data.
机译:我们提出了一种端到端的深度网络模型,从单通道音频录制执行会议日期。与传统的基于聚类的血液化方法不同,端到端的日复日复速动模型具有处理扬声器重叠并使鉴别性培训的直接处理。该建议的系统旨在处理与未知数量的扬声器数量的会议,使用基于可变数置换不变的跨熵的丢失函数。我们介绍了几种似乎有助于深度化性能的组件,包括本地卷积网络,然后是全球自我注意模块,使用扬声器识别组件的多任务传输学习,以及模型用第二阶段精制模型的顺序方法。这些培训并在基于LibrisPeech和库数据集的模拟会议数据上验证并验证;使用Librics完成最终评估,该天秤座由使用扬声器播放使用真实声学记录的模拟会议。所提出的模型比以前提出的结束于最终的晚期日复速度模型更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号