首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Speaker Recognition With Random Digit Strings Using Uncertainty Normalized HMM-Based i-Vectors
【24h】

Speaker Recognition With Random Digit Strings Using Uncertainty Normalized HMM-Based i-Vectors

机译:基于不确定性归一化HMM的i向量的带有随机数字字符串的说话人识别

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we combine Hidden Markov Models (HMMs) with i-vector extractors to address the problem of text-dependent speaker recognition with random digit strings. We employ digit-specific HMMs to segment the utterances into digits, to perform frame alignment to HMM states and to extract Baum-Welch statistics. By making use of the natural partition of input features into digits, we train digit-specific i-vector extractors on top of each HMM and we extract well-localized i-vectors, each modelling merely the phonetic content corresponding to a single digit. We then examine ways to perform channel and uncertainty compensation, and we propose a novel method for using the uncertainty in the i-vector estimates. The experiments on RSR2015 part III show that the proposed method attains 1.52% and 1.77% Equal Error Rate (EER) for male and female respectively, outperforming state-of-the-art methods such as x-vectors, trained on vast amounts of data. Furthermore, these results are attained by a single system trained entirely on RSR2015, and by a simple score-normalized cosine distance. Moreover, we show that the omission of channel compensation yields only a minor degradation in performance, meaning that the system attains state-of-the-art results even without recordings from multiple handsets per speaker for training or enrolment. Similar conclusions are drawn from our experiments on the RedDots corpus, where the same method is evaluated on phrases. Finally, we report results with bottleneck features and show that further improvement is attained when fusing them with spectral features.
机译:在本文中,我们将隐马尔可夫模型(HMM)与i矢量提取器结合使用,以解决带有随机数字字符串的文本相关说话人识别问题。我们采用特定于数字的HMM将发声段划分为数字,对HMM状态执行帧对齐并提取Baum-Welch统计信息。通过利用输入特征的自然划分,我们在每个HMM的顶部训练特定于数字的i-vector提取器,并提取定位良好的i-vector,每个i-vector都仅对与单个数字相对应的语音内容进行建模。然后,我们研究了执行通道和不确定性补偿的方法,并提出了一种在i-vector估计中使用不确定性的新方法。 RSR2015第III部分的实验表明,该方法对男性和女性的均等错误率(EER)分别为1.52%和1.77%,优于在大量数据上训练的最新方法(例如x矢量) 。此外,这些结果是通过完全在RSR2015上训练的单个系统以及简单的分数归一化余弦距离来获得的。此外,我们表明,省略信道补偿只会使性能稍有下降,这意味着即使没有为每个扬声器培训或注册而从多个听筒录制的录音,该系统也能获得最新的结果。从我们对RedDots语料库的实验得出类似的结论,其中对短语评估相同的方法。最后,我们报告了具有瓶颈特征的结果,并表明将它们与光谱特征融合可以实现进一步的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号