首页> 外文期刊>Computer speech and language >Feature-space SVM adaptation for speaker adapted word prominence detection
【24h】

Feature-space SVM adaptation for speaker adapted word prominence detection

机译:特征空间SVM自适应,用于说话人自适应单词突出检测

获取原文
获取原文并翻译 | 示例
       

摘要

Prosodic cues such as the word prominence play a fundamental role in human communication, e.g., to express important information. Since different speakers use a wide variety of features to express prominence, there is a large difference in performance between speaker dependently and speaker independently trained models. To cope with these variations without training a new speaker dependent model, in speech recognition speaker adaptation techniques such as feature-space Maximum Likelihood Linear Regression (fMLLR) turned out to be very useful. These methods are developed for GMM-HMM based classifiers under the assumption that the data can be well modeled via the mixture of a few Gaussian distributions. However, in many cases these assumptions are too restrictive. In particular a discriminative classifier such as an SVM often yields far superior results to a GMM. Therefore, we propose a new adaptation method, which adapts the data to the radial basis function kernel of the SVM. To avoid overfitting we apply two regularization terms. The first is based on fMLLR and the second is an L-1 regularization to enforce a sparse transformation matrix. We analyze the method in the context of speaker adaptation for word prominence detection, with varying amounts of adaptation data and different weights of the regularization terms. We show that our novel method clearly outperforms fMLLR-GMM and fMLLR-SVM based adaptation. (C) 2018 Elsevier Ltd. All rights reserved.
机译:诸如“突出”一词之类的韵律线索在人类交流中(例如,表达重要信息)起着基本作用。由于不同的发言者使用各种各样的功能来表达突出,因此在依赖于发言者和未经发言者独立训练的模型之间的性能存在很大差异。为了在不训练新的依赖于说话者的模型的情况下应对这些变化,在语音识别中,说话者自适应技术(例如特征空间最大似然线性回归(fMLLR))非常有用。这些方法是针对基于GMM-HMM的分类器而开发的,其前提是可以通过几个高斯分布的混合对数据进行很好的建模。但是,在许多情况下,这些假设过于严格。特别是,诸如SVM之类的判别式分类器通常会产生比GMM更好的结果。因此,我们提出了一种新的自适应方法,该方法将数据自适应到支持向量机的径向基函数内核。为了避免过度拟合,我们应用了两个正则项。第一个基于fMLLR,第二个基于L-1正则化,以实施稀疏变换矩阵。我们在说话人自适应的背景下分析了用于单词突出检测的方法,其中自适应数据的数量不断变化,正则项的权重也不同。我们表明,我们的新方法明显优于基于fMLLR-GMM和fMLLR-SVM的自适应方法。 (C)2018 Elsevier Ltd.保留所有权利。

著录项

  • 来源
    《Computer speech and language》 |2019年第1期|198-216|共19页
  • 作者单位

    Tech Univ Darmstadt, Control Methods & Robot Lab, Landgraf Georg Str 4, D-64283 Darmstadt, Germany;

    Honda Res Inst Europe GmbH, Carl Legien Str 30, D-63073 Offenbach, Germany;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Prosody; Speaker adaptation; FMLLR; SVM; Prominence;

    机译:韵律;扬声器适应;FMLLR;SVM;突出;
  • 入库时间 2022-08-18 04:05:20

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号