首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Pre-training of Speaker Embeddings for Low-latency Speaker Change Detection in Broadcast News
【24h】

Pre-training of Speaker Embeddings for Low-latency Speaker Change Detection in Broadcast News

机译:广播语音中低延迟说话人变化检测的说话人嵌入预训练

获取原文

摘要

In this work, we investigate pre-training of neural network based speaker embeddings for low-latency speaker change detection. Our proposed system takes two speech segments, generates embeddings using shared Siamese layers and then classifies the concatenated embeddings depending on whether they are spoken by the same speaker. We investigate gender classification, contrastive loss and triplet loss based pre-training of the embedding layers and also joint training of the embedding layers along with a same/different classifier. Training is performed on 2-second single speaker segments based on ground truth speaker segmentation of broadcast news data. However, during test, we use the detection system in a practical low-latency setting for annotating automatic closed captions. In contrast to training, test pairs are now created around automatic speech recognition (ASR) based segmentation boundaries. The ASR segments are often shorter than 2 seconds causing duration mismatch during testing. In our experiments, although the baseline i-vector based classifier performs well, the proposed triplet loss based pre-training followed by joint training provides 7-50% relative F-measure improvement in matched and mismatched conditions. In addition, the degradation in performance is less severe for network based embeddings as compared to using i-vectors in the variable duration test conditions.
机译:在这项工作中,我们研究了基于神经网络的说话人嵌入的预训练,以实现低延迟的说话人变化检测。我们提出的系统采用两个语音段,使用共享的暹罗语层生成嵌入,然后根据连接的嵌入是否由同一说话者说出来进行分类。我们研究了基于性别分类,对比损失和三重损失的嵌入层预训练,以及与相同/不同分类器一起的嵌入层联合训练。基于广播新闻数据的地面真实讲话者分段,在2秒的单个讲话者分段上进行训练。但是,在测试过程中,我们在实用的低延迟设置中使用检测系统来注释自动隐藏字幕。与训练相反,现在围绕基于自动语音识别(ASR)的分段边界创建测试对。 ASR段通常短于2秒,从而导致测试期间的持续时间不匹配。在我们的实验中,尽管基于基线i-vector的分类器表现良好,但在匹配和不匹配条件下,基于三重态损失的预训练以及联合训练提供了7-50%的相对F值改进。另外,与在可变持续时间测试条件下使用i-vector相比,基于网络的嵌入的性能下降不那么严重。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号