首页> 外文会议>2011 IEEE Workshop on Automatic Speech Recognition amp; Understanding >Robust seed model training for speaker adaptation using pseudo-speaker features generated by inverse CMLLR transformation
【24h】

Robust seed model training for speaker adaptation using pseudo-speaker features generated by inverse CMLLR transformation

机译:使用逆CMLLR转换生成的伪说话人特征进行健壮的种子模型训练以适应说话人

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we propose a novel acoustic model training method which is suitable for speaker adaptation in speech recognition. Our method is based on feature generation from a small amount of speakers' data. For decades, speaker adaptation methods have been widely used. Such adaptation methods need some amount of adaptation data and if the data is not sufficient, speech recognition performance degrade significantly. If the seed models to be adapted to a specific speaker can widely cover more speakers, speaker adaptation can perform robustly. To make such robust seed models, we adopt inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then train our seed models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the MLLR transformation matrices for the existing speakers is estimated. Next we generate pseudo-speaker MLLR transformations by sampling the weight parameters from the distribution, and apply the inverse of the transformation to the normalized existing speaker features to generate the pseudo-speakers' features. Finally, using these features, we train the acoustic seed models. Using this seed models, we obtained better speaker adaptation results than using simply environmentally adapted models.
机译:在本文中,我们提出了一种新的声学模型训练方法,该方法适用于语音识别中的说话人自适应。我们的方法基于少量说话者数据的特征生成。几十年来,说话人适应方法已被广泛使用。这样的自适应方法需要一定数量的自适应数据,如果数据不足,语音识别性能将大大降低。如果要适合特定扬声器的种子模型可以广泛覆盖更多扬声器,则扬声器自适应可以表现出色。为了制作这样的种子模型,我们采用基于最大似然逆线性回归(MLLR)变换的特征生成,然后使用这些特征训练种子模型。首先,我们从数量有限的现有说话人那里获得MLLR变换矩阵。然后,我们使用PCA提取MLLR变换矩阵的基数。估计表示现有扬声器的MLLR变换矩阵的权重参数的分布。接下来,我们通过从分布中采样权重参数来生成伪扬声器MLLR转换,并将转换的逆应用于归一化的现有扬声器特征以生成伪扬声器的特征。最后,使用这些功能,我们训练了声学种子模型。使用这种种子模型,与仅使用环境适应模型相比,我们获得了更好的说话人适应效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号