首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Pre-training of Speaker Embeddings for Low-latency Speaker Change Detection in Broadcast News
【24h】

Pre-training of Speaker Embeddings for Low-latency Speaker Change Detection in Broadcast News

机译:广播新闻中低延迟扬声器更改检测的扬声器嵌入式预培训

获取原文

摘要

In this work, we investigate pre-training of neural network based speaker embeddings for low-latency speaker change detection. Our proposed system takes two speech segments, generates embeddings using shared Siamese layers and then classifies the concatenated embeddings depending on whether they are spoken by the same speaker. We investigate gender classification, contrastive loss and triplet loss based pre-training of the embedding layers and also joint training of the embedding layers along with a same/different classifier. Training is performed on 2-second single speaker segments based on ground truth speaker segmentation of broadcast news data. However, during test, we use the detection system in a practical low-latency setting for annotating automatic closed captions. In contrast to training, test pairs are now created around automatic speech recognition (ASR) based segmentation boundaries. The ASR segments are often shorter than 2 seconds causing duration mismatch during testing. In our experiments, although the baseline i-vector based classifier performs well, the proposed triplet loss based pre-training followed by joint training provides 7-50% relative F-measure improvement in matched and mismatched conditions. In addition, the degradation in performance is less severe for network based embeddings as compared to using i-vectors in the variable duration test conditions.
机译:在这项工作中,我们调查了对基于神经网络的扬声器嵌入的预训练,用于低延迟扬声器改变检测。我们建议的系统采用两个语音段,使用共享的暹罗层生成嵌入式,然后根据同一扬声器是否由同一扬声器进行分类,分类连接嵌入式。我们研究了基于性别分类,对比损失和三重态损失的基于嵌入层,以及与相同/不同的分类器的嵌入层联合训练。基于广播新闻数据的地面真理扬声器分割,在2秒单个扬声器段执行培训。但是,在测试期间,我们在实际的低延迟设置中使用检测系统,用于注释自动关闭标题。与训练相比,现在在基于自动语音识别(ASR)的分段边界周围创建了测试对。 ASR段通常短于2秒,导致测试期间的持续时间不匹配。在我们的实验中,虽然基线I形式基于矢量的分类器表现良好,但所提出的基于三态损耗的基于三重损失的预训练,然后是联合训练提供了7-50%的相对F测量改善,匹配和错配的条件。此外,与在可变持续时间测试条件中使用I-VICTORS相比,对于基于网络的嵌入性,性能的降级不太严重。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号