【24h】

TristouNet: Triplet loss for speaker turn embedding

机译:TristouNet:扬声器转向嵌入的三重损失

获取原文

摘要

TristouNet is a neural network architecture based on Long Short-Term Memory recurrent networks, meant to project speech sequences into a fixed-dimensional euclidean space. Thanks to the triplet loss paradigm used for training, the resulting sequence embeddings can be compared directly with the euclidean distance, for speaker comparison purposes. Experiments on short (between 500ms and 5s) speech turn comparison and speaker change detection show that TristouNet brings significant improvements over the current state-of-the-art techniques for both tasks.
机译:TristouNet是基于长期短期记忆递归网络的神经网络体系结构,旨在将语音序列投影到固定维数的欧几里得空间中。多亏了用于训练的三重态损失范式,可以将所得的序列嵌入直接与欧几里得距离进行比较,以进行说话人比较。简短的语音对话比较(500ms和5s之间)和说话人变化检测的实验表明,TristouNet相对于当前两项任务的最新技术都带来了显着改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号