首页> 外文期刊>IEEE transactions on audio, speech and language processing >Embedded kernel eigenvoice speaker adaptation and its implication to reference speaker weighting
【24h】

Embedded kernel eigenvoice speaker adaptation and its implication to reference speaker weighting

机译:嵌入式内核特征语音说话人自适应及其对参考说话人加权的影响

获取原文
获取原文并翻译 | 示例

摘要

Recently, we proposed an improvement to the conventional eigenvoice (EV) speaker adaptation using kernel methods. In our novel kernel eigenvoice (KEV) speaker adaptation, speaker supervectors are mapped to a kernel-induced high dimensional feature space, where eigenvoices are computed using kernel principal component analysis. A new speaker model is then constructed as a linear combination of the leading eigenvoices in the kernel-induced feature space. KEV adaptation was shown to outperform EV, MAP, and MLLR adaptation in a TIDIGITS task with less than 10 s of adaptation speech. Nonetheless, due to many kernel evaluations, both adaptation and subsequent recognition in KEV adaptation are considerably slower than conventional EV adaptation. In this paper, we solve the efficiency problem and eliminate all kernel evaluations involving adaptation or testing observations by finding an approximate pre-image of the implicit adapted model found by KEV adaptation in the feature space; we call our new method embedded kernel eigenvoice (eKEV) adaptation. eKEV adaptation is faster than KEV adaptation, and subsequent recognition runs as fast as normal HMM decoding. eKEV adaptation makes use of multidimensional scaling technique so that the resulting adapted model lies in the span of a subset of carefully chosen training speakers. It is related to the reference speaker weighting (RSW) adaptation method that is based on speaker clustering. Our experimental results on Wall Street Journal show that eKEV adaptation continues to outperform EV, MAP, MLLR, and the original RSW method. However, by adopting the way we choose the subset of reference speakers for eKEV adaptation, we may also improve RSW adaptation so that it performs as well as our eKEV adaptation.
机译:最近,我们提出了使用内核方法对常规本征语音(EV)说话人自适应的改进。在我们新颖的内核特征语音(KEV)说话者自适应中,说话者超向量被映射到内核诱导的高维特征空间,其中特征语音是使用内核主成分分析来计算的。然后,将新的说话者模型构建为内核诱导特征空间中前导特征语音的线性组合。在TIDIGITS任务中,在不到10秒的自适应语音的情况下,KEV自适应性能优于EV,MAP和MLLR自适应。尽管如此,由于许多内核评估,KEV自适应中的自适应和后续识别都比常规EV自适应慢得多。在本文中,我们通过在特征空间中找到通过KEV自适应找到的隐式自适应模型的近似原像,从而解决了效率问题并消除了所有涉及自适应或测试观察的内核评估;我们称这种新方法为嵌入式内核特征语音(eKEV)自适应。 eKEV自适应比KEV自适应快,后续识别的运行速度与普通HMM解码一样快。 eKEV改编利用了多维缩放技术,因此生成的改编模型位于精心选择的训练说话者子集的范围内。它涉及基于说话者聚类的参考说话者加权(RSW)自适应方法。我们在《华尔街日报》上的实验结果表明,eKEV的适应性继续优于EV,MAP,MLLR和原始的RSW方法。但是,通过采用为eKEV自适应选择参考说话者子集的方式,我们还可以改善RSW自适应,以使其表现与我们的eKEV自适应一样好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号