首页> 外文会议>International Conference on Information Systems Architecture and Technology >Speaker Diarization Using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings
【24h】

Speaker Diarization Using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings

机译:使用深频卷积神经网络进行扬声器嵌入的扬声器日益改估

获取原文

摘要

In this paper we propose a new method of speaker diarization that employs a deep learning architecture to learn speaker embeddings. In contrast to the traditional approaches that build their speaker embeddings using manually hand-crafted spectral features, we propose to train for this purpose a recurrent convolutional neural network applied directly on magnitude spectrograms. To compare our approach with the state of the art, we collect and release for the public an additional dataset of over 6 h of fully annotated broadcast material. The results of our evaluation on the new dataset and three other benchmark datasets show that our proposed method significantly outperforms the competitors and reduces diarization error rate by a large margin of over 30% with respect to the baseline.
机译:在本文中,我们提出了一种新的扬声器日益改估方法,采用深入的学习架构来学习扬声器嵌入。与使用手动制作的光谱特征构建扬声器嵌入的传统方法形成鲜明对比,我们建议为此目的培训一项直接在幅度谱图上施加的经常性卷积神经网络。为了将我们的方法与现有技术进行比较,我们为公众收集和释放一个超过6小时的公共数据集超过6小时。我们对新数据集和其他三个基准数据集的评估结果表明,我们所提出的方法显着优于竞争对手并通过相对于基线的大幅度超过30%的大幅度降低了深度误差率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号