首页> 外文会议>Odyssey 2010: the speaker and language recognition workshop >Unsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verification
【24h】

Unsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verification

机译:基于余弦相似度的无监督说话人自适应,用于独立于文本的说话人验证

获取原文
获取原文并翻译 | 示例

摘要

This paper proposes a new approach to unsupervised speaker adaptation inspired by the recent success of the factor analysis-based Total Variability Approach to text-independent speaker verification [1]. This approach effectively represents speaker variability in terms of low-dimensional total factor vectors and, when paired alongside the simplicity of cosine similarity scoring, allows for easy manipulation and efficient computation [2]. The development of our adaptation algorithm is motivated by the desire to have a robust method of setting an adaptation threshold, to minimize the amount of required computation for each adaptation update, and to simplify the associated score normalization procedures where possible. To address the final issue, we propose the Symmetric Normalization (S-norm) method, which takes advantage of the symmetry in cosine similarity scoring and achieves competitive performance to that of the ZT-norm while requiring fewer parameter calculations. In subsequent experiments, we also assess an attempt to replace the use of score normalization procedures altogether with a Normalized Cosine Similarity scoring function [3].rnWe evaluated the performance of our unsupervised speaker adaptation algorithm under various score normalization procedures on the l0sec-l0sec and core conditions of the 2008 NIST SRE dataset. Using results without adaptation as our baseline, it was found that the proposed methods are consistent in successfully improving speaker verification performance to achieve state-of-the-art results.
机译:本文提出了一种新的无监督说话人适应方法,该方法受基于因子分析的总可变性方法在不依赖文本的说话人验证中的最新成功的启发[1]。这种方法有效地表示了说话人在低维总因子矢量方面的可变性,并且与余弦相似性评分的简单性一起使用时,易于操作和高效计算[2]。我们的自适应算法的开发是出于对一种具有鲁棒性的方法来设置自适应阈值,最小化每次自适应更新所需的计算量以及在可能的情况下简化相关分数标准化过程的渴望。为了解决最后一个问题,我们提出了对称归一化(S-norm)方法,该方法在余弦相似度评分中利用了对称性,并且与ZT-norm相比具有竞争性,同时需要更少的参数计算。在随后的实验中,我们还评估了用标准化的余弦相似性评分功能[3]代替分数标准化程序的尝试。rn我们评估了在10秒至10秒和10秒至10秒之间各种分数标准化程序下无监督说话人自适应算法的性能。 2008 NIST SRE数据集的核心条件。使用没有改编的结果作为我们的基准,发现所提出的方法在成功地提高说话者验证性能以实现最新结果方面是一致的。

著录项

  • 来源
  • 会议地点 Brno(CS)
  • 作者单位

    MIT Computer Science and Artificial Intelligence Laboratory 32 Vassar Street, Cambridge, MA 02139, USA;

    MIT Computer Science and Artificial Intelligence Laboratory 32 Vassar Street, Cambridge, MA 02139, USA;

    Laboratoire de Recherche et de Developpement de l'EPITA (LRDE), Paris, France;

    MIT Computer Science and Artificial Intelligence Laboratory 32 Vassar Street, Cambridge, MA 02139, USA;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 语音信号处理;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号