首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Source Domain Data Selection for Improved Transfer Learning Targeting Dysarthric Speech Recognition
【24h】

Source Domain Data Selection for Improved Transfer Learning Targeting Dysarthric Speech Recognition

机译:源域数据选择以改善针对转位语音识别的转移学习

获取原文

摘要

This paper presents an improved transfer learning framework applied to robust personalised speech recognition models for speakers with dysarthria. As the baseline of transfer learning, a state-of-the-art CNN-TDNN-F ASR acoustic model trained solely on source domain data is adapted onto the target domain via neural network weight adaptation with the limited available data from target dysarthric speakers. Results show that linear weights in neural layers play the most important role for an improved modelling of dysarthric speech evaluated using UASpeech corpus, achieving averaged 11.6% and 7.6% relative recognition improvement in comparison to the conventional speaker-dependent training and data combination, respectively. To further improve the transferability towards target domain, we propose an utterance-based data selection of the source domain data based on the entropy of posterior probability, which is analysed to statistically obey a Gaussian distribution. Compared to a speaker-based data selection via dysarthria similarity measure, this allows for a more accurate selection of the potentially beneficial source domain data for either increasing the target domain training pool or constructing an intermediate domain for incremental transfer learning, resulting in a further absolute recognition performance improvement of nearly 2% added to transfer learning baseline for speakers with moderate to severe dysarthria.
机译:本文提出了一种改进的转移学习框架,该框架适用于构音障碍者的健壮的个性化语音识别模型。作为转移学习的基础,仅使用源域数据进行训练的最新CNN-TDNN-F ASR声学模型通过神经网络权重自适应,利用来自目标反音扬声器的有限可用数据,将其应用于目标域。结果表明,神经层中的线性权重对于使用UASpeech语料库评估的构音障碍语音的改进建模起着最重要的作用,与传统的依赖于说话者的训练和数据组合相比,分别平均获得了11.6%和7.6%的相对识别率提高。为了进一步提高向目标域的可传递性,我们提出了基于后验概率熵的源域数据基于发声的数据选择,并对其进行了统计分析,以服从高斯分布。与通过构音障碍相似性度量的基于说话者的数据选择相比,这可以更准确地选择潜在有益的源域数据,以增加目标域训练池或构建用于增量转移学习的中间域,从而进一步实现绝对识别性能提高了近2%,为中度至重度构音障碍的说话者转移了学习基线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号