首页> 外文期刊>Computer speech and language >Fast i-vector denoising using MAP estimation and a noise distributions database for robust speaker recognition
【24h】

Fast i-vector denoising using MAP estimation and a noise distributions database for robust speaker recognition

机译:使用MAP估计和噪声分布数据库的快速i矢量去噪,可实现可靠的说话人识别

获取原文
获取原文并翻译 | 示例
           

摘要

Once the i-vector paradigm has been introduced in the field of speaker recognition, many techniques have been proposed to deal with additive noise within this framework. Due to the complexity of its effect in the i-vector space, a lot of effort has been put into dealing with noise in other domains (speech enhancement, feature compensation, robust i-vector extraction and robust scoring). As far as we know, there was no serious attempt to handle the noise problem directly in the i-vector space without relying on data distributions computed on a prior domain. The aim of this paper is twofold. First, it proposes a full-covariance Gaussian modeling of the clean i-vectors and noise distribution in the i-vector space and introduces a technique to estimate a clean i-vector given the noisy version and the noise density function using the MAP approach. Based on NIST data, we show that it is possible to improve by up to 60% the baseline system performance. Second, in order to make this algorithm usable in a real application and reduce the computational time needed by i-MAP, we propose an extension that requires building a noise distribution database in the i-vector space in an off-line step and using it later in the test phase. We show that it is possible to achieve comparable results using this approach (up to 57% of relative EER improvement) with a sufficiently large noise distribution database.
机译:一旦在说话人识别领域引入了i-vector范式,就已经提出了许多在此框架内处理加性噪声的技术。由于其在i向量空间中影响的复杂性,已经在处理其他领域中的噪声方面做了很多努力(语音增强,特征补偿,鲁棒的i向量提取和鲁棒评分)。据我们所知,没有认真尝试直接在i向量空间中处理噪声问题,而不依赖于在先前域上计算的数据分布。本文的目的是双重的。首先,它提出了纯i向量和i向量空间中噪声分布的全协方差高斯建模,并介绍了一种使用MAP方法估计有噪声版本和噪声密度函数的纯i向量的技术。根据NIST数据,我们显示可以将基准系统性能提高多达60%。其次,为了使该算法在实际应用中可用并减少i-MAP所需的计算时间,我们提出了一种扩展,该扩展要求在离线步骤中在i-vector空间中构建噪声分布数据库并使用它在测试阶段的后期。我们表明,使用这种方法(足够的相对EER改善高达57%),并使用足够大的噪声分布数据库,可以实现可比的结果。

著录项

  • 来源
    《Computer speech and language》 |2017年第9期|104-122|共19页
  • 作者单位

    Laboratoire Informatique d'Avignon (LIA), Universite d'Avignon Agroparc BP 1228, 84911 Avignon Cedex 9, France;

    Laboratoire Informatique d'Avignon (LIA), Universite d'Avignon Agroparc BP 1228, 84911 Avignon Cedex 9, France;

    Laboratoire Informatique d'Avignon (LIA), Universite d'Avignon Agroparc BP 1228, 84911 Avignon Cedex 9, France;

    Laboratoire Informatique d'Avignon (LIA), Universite d'Avignon Agroparc BP 1228, 84911 Avignon Cedex 9, France;

    Laboratoire Informatique d'Avignon (LIA), Universite d'Avignon Agroparc BP 1228, 84911 Avignon Cedex 9, France;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    i-vectors; MAP adaptation; Speaker recognition; Additive noise;

    机译:i向量MAP适应;说话人识别;加性噪声;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号