首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Improving Reconstruction Loss Based Speaker Embedding in Unsupervised and Semi-Supervised Scenarios

【24h】

Improving Reconstruction Loss Based Speaker Embedding in Unsupervised and Semi-Supervised Scenarios

机译：在无监督和半监督方案中改进基于重建的扬声器嵌入

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text-to-speech (TTS) models trained to minimize the spectrogram reconstruction loss can learn speaker embeddings without explicit speaker identity supervision, unlike x-vector speaker identification (SID) systems. Leveraging this way of speaker embedding learning can be useful in unsupervised or semi-supervised scenarios where non, or only some, of the training data have speaker labels. Thus, in this paper, we evaluate speaker embeddings learned by training the spectrogram prediction network under unsupervised and semi-supervised scenarios. We experimented with different data sampling strategies. The best one was sampling two different segments from the same utterance, namely A and B, where the spectrogram of B is predicted given the B phone sequence and the speaker embedding extracted from A. This method improved by 3.4% relative in EER, compared to using the same utterance for both A and B without segmenting. In the unsupervised scenario, the best speaker embedding outperformed i-vectors, the state-of-the-art unsupervised speaker embedding, in speaker verification by 12.9% relative in EER. We observed high correlation between reconstruction loss and speaker embedding quality. In the semi-supervised scenario, having more unlabeled data in training led to a better performance in speaker verification. Adding 5314 unlabeled speakers to 800 labeled speakers improved EER by 10.8 % relative.

机译：培训的文本到语音（TTS）模型，以最大限度地减少频谱图重建损失可以在没有明确的扬声器身份监督的情况下学习扬声器嵌入式，而不是X-Vector扬声器识别（SID）系统。利用这种扬声器嵌入学习的方式可以在非监督或半监督方案中有用，其中培训数据的非或仅限于某些方案具有扬声器标签。因此，在本文中，我们通过在无监督和半监督方案下培训频谱图预测网络来评估讲话者嵌入。我们尝试了不同的数据采样策略。最好的一个是从相同的话语中采样两个不同的段，即A和B，其中B的频谱图给出了B电话序列和从A中提取的扬声器嵌入。该方法在eer中提高了3.4％的相对使用同样的A和B没有分割的话语。在无人监督的情况下，最好的扬声器嵌入了I-Vipors，最先进的无监督扬声器嵌入eer中的扬声器验证，在eer中的12.9％。我们观察到重建损失和扬声器嵌入质量之间的高相关性。在半监督场景中，在训练中具有更具未标记的数据，导致扬声器验证的更好性能。将5314个未标记的扬声器添加到800个标记的扬声器提升了10.8％的eer。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing 》|2021年|6733-6737|共5页
会议地点
作者
Jaejin Cho; Piotr Żelasko; Jesús Villalba; Najim Dehak;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Correlation; Conferences; Pipelines; Training data; Speech recognition; Decoding;

机译：培训;关联;会议;管道;培训数据;语音识别;解码;

相似文献

外文文献
中文文献
专利

1. Flexible semi-supervised embedding based on adaptive loss regression: Application to image categorization [J] . El Traboulsi Y., Dornaika F. Information Sciences: An International Journal . 2018 ,第期

机译：基于自适应损耗回归的灵活半监督嵌入：应用于图像分类
2. Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models [J] . Randy GOMEZ, Akinobu LEE, Tomoki TODA, IEICE Transactions on Information and Systems . 2006 ,第3期

机译：使用多模板模型在嘈杂环境中提高基于HMM足够统计量的快速无监督说话人适应
3. Embedded, real-time UAV control for improved, image-based 3D scene reconstruction [J] . Lienard Jean, Vogs Andre, Gatziolis Demetrios, Measurement . 2016 ,第Null期

机译：嵌入式实时无人机控制，用于改进基于图像的3D场景重建
4. Learning Discriminative Speaker Embedding by Improving Aggregation Strategy and Loss Function for Speaker Verification [C] . Chengfang Luo, Xin Guo, Aiwen Deng, IEEE International Joint Conference on Biometrics . 2021

机译：通过改进发言验证的聚合战略和损失函数来嵌入歧视性扬声器
5. Graph-based Latent Embedding, Annotation and Representation Learning in Neural Networks for Semi-supervised and Unsupervised Settings [D] . Kilinc, Ismail Ozsel. 2017

机译：半监督和非监督设置的神经网络中基于图的潜在嵌入，注释和表示学习
6. Unsupervised Exercise Training Was Not Found to Improve the Metabolic Health or Phenotype over a 6-Month Dietary Intervention: A Randomised Controlled Trial with an Embedded Economic Analysis [O] . Wendy Hens, Dirk Vissers, Nick Verhaeghe, 2021

机译：未经监督的运动培训未发现在6个月的饮食干预中改善代谢健康或表型：随机对照试验嵌入式经济分析
7. Semi-supervised Triplet Loss Based Learning of Ambient Audio Embeddings [O] . Nicolas Turpault, Romain Serizel, Emmanuel Vincent 2019

机译：基于半监督的三态损失的环境音频嵌入

Improving Reconstruction Loss Based Speaker Embedding in Unsupervised and Semi-Supervised Scenarios

摘要

著录项

相似文献

相关主题

期刊订阅