首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Improving Reconstruction Loss Based Speaker Embedding in Unsupervised and Semi-Supervised Scenarios
【24h】

Improving Reconstruction Loss Based Speaker Embedding in Unsupervised and Semi-Supervised Scenarios

机译:在无监督和半监督方案中改进基于重建的扬声器嵌入

获取原文

摘要

Text-to-speech (TTS) models trained to minimize the spectrogram reconstruction loss can learn speaker embeddings without explicit speaker identity supervision, unlike x-vector speaker identification (SID) systems. Leveraging this way of speaker embedding learning can be useful in unsupervised or semi-supervised scenarios where non, or only some, of the training data have speaker labels. Thus, in this paper, we evaluate speaker embeddings learned by training the spectrogram prediction network under unsupervised and semi-supervised scenarios. We experimented with different data sampling strategies. The best one was sampling two different segments from the same utterance, namely A and B, where the spectrogram of B is predicted given the B phone sequence and the speaker embedding extracted from A. This method improved by 3.4% relative in EER, compared to using the same utterance for both A and B without segmenting. In the unsupervised scenario, the best speaker embedding outperformed i-vectors, the state-of-the-art unsupervised speaker embedding, in speaker verification by 12.9% relative in EER. We observed high correlation between reconstruction loss and speaker embedding quality. In the semi-supervised scenario, having more unlabeled data in training led to a better performance in speaker verification. Adding 5314 unlabeled speakers to 800 labeled speakers improved EER by 10.8 % relative.
机译:培训的文本到语音(TTS)模型,以最大限度地减少频谱图重建损失可以在没有明确的扬声器身份监督的情况下学习扬声器嵌入式,而不是X-Vector扬声器识别(SID)系统。利用这种扬声器嵌入学习的方式可以在非监督或半监督方案中有用,其中培训数据的非或仅限于某些方案具有扬声器标签。因此,在本文中,我们通过在无监督和半监督方案下培训频谱图预测网络来评估讲话者嵌入。我们尝试了不同的数据采样策略。最好的一个是从相同的话语中采样两个不同的段,即A和B,其中B的频谱图给出了B电话序列和从A中提取的扬声器嵌入。该方法在eer中提高了3.4%的相对使用同样的A和B没有分割的话语。在无人监督的情况下,最好的扬声器嵌入了I-Vipors,最先进的无监督扬声器嵌入eer中的扬声器验证,在eer中的12.9%。我们观察到重建损失和扬声器嵌入质量之间的高相关性。在半监督场景中,在训练中具有更具未标记的数据,导致扬声器验证的更好性能。将5314个未标记的扬声器添加到800个标记的扬声器提升了10.8%的eer。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号