首页> 外文期刊>Selected Topics in Signal Processing, IEEE Journal of >Data Augmentation Using Virtual Microphone Array Synthesis and Multi-Resolution Feature Extraction for Isolated Word Dysarthric Speech Recognition
【24h】

Data Augmentation Using Virtual Microphone Array Synthesis and Multi-Resolution Feature Extraction for Isolated Word Dysarthric Speech Recognition

机译:使用虚拟麦克风阵列综合和多分辨率特征提取的数据增强用于隔离字发育arthric语音识别

获取原文
获取原文并翻译 | 示例

摘要

Dysarthria is a speech-motor disorder that affects the articulatory systems inhibiting their speech communication efforts. To handle their communication problems, a speech recognition-based augmentative and alternative communication aid is used as an attractive alternative. However, successful development of an automatic speech recognition (ASR)-based aid depends on the availability of sufficient speech data for training. Building an ASR system for dysarthric speakers is difficult due to limited amount of training data and large inter-and-intra speaker variabilities. Using normal speaker's speech data for data augmentation or adaptation for low intelligible dysarthric speakers would be extremely challenging due to huge variation in acoustic characteristics between these two category of speakers. In the current article, a two-level data augmentation is performed on dysarthric speech based on virtual linear microphone array-based synthesis followed by multi-resolution feature extraction. With the augmented speech data, an isolated word hybrid DNN-HMM-based ASR system is trained using UA speech corpus and Tamil dysarthric speech corpus developed by the authors. Performance of the ASR system shows a reduced WER of up to 32.79%, 35.75% for low and very low intelligible speakers with dysarthria compared to recent works on data augmentation reported for dysarthric speech recognition.
机译:扰动性是一种影响抑制他们的语音沟通努力的关节系统的语音电机障碍。为了处理他们的沟通问题,语音识别的增强和替代通信辅助用作有吸引力的替代方案。然而,基于自动语音识别(ASR)的辅助工人的成功开发取决于足够的语音数据进行培训。由于培训数据和较大的讲话器变量,构建了疑似扬声器的ASR系统很难。由于这两类扬声器之间的声学​​特性巨大变化,使用正常扬声器的数据增强或适应性的数据增强或适应性的语音数据将极为挑战。在本文的文章中,基于虚拟线性麦克风阵列的合成,在多分辨率特征提取之后,对缺陷语音进行两级数据增强。利用增强的语音数据,使用由作者开发的UA语音语料库和泰米尔暂性语音语料库进行了孤立的字混合DNN-HMM的ASR系统。与最近有关报告的数据增强的有关报告的数据增强的工作相比,ASR系统的性能显示出高达32.79%,对于具有讨厌的扬声器,35.75%,对于令人讨厌的扬声器,对于令人发育性arthric语音识别的数据增强的工作增加。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号