首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Data Augmentation for Deep Neural Network Acoustic Modeling
【24h】

Data Augmentation for Deep Neural Network Acoustic Modeling

机译:深度神经网络声学建模的数据增强

获取原文
获取原文并翻译 | 示例

摘要

This paper investigates data augmentation for deep neural network acoustic modeling based on label-preserving transformations to deal with data sparsity. Two data augmentation approaches, vocal tract length perturbation (VTLP) and stochastic feature mapping (SFM), are investigated for both deep neural networks (DNNs) and convolutional neural networks (CNNs). The approaches are focused on increasing speaker and speech variations of the limited training data such that the acoustic models trained with the augmented data are more robust to such variations. In addition, a two-stage data augmentation scheme based on a stacked architecture is proposed to combine VTLP and SFM as complementary approaches. Experiments are conducted on Assamese and Haitian Creole, two development languages of the IARPA Babel program, and improved performance on automatic speech recognition (ASR) and keyword search (KWS) is reported.
机译:本文研究了基于保留标签的变换来处理数据稀疏性的深度神经网络声学建模的数据扩充。针对深度神经网络(DNN)和卷积神经网络(CNN),研究了两种数据增强方法,声道长度扰动(VTLP)和随机特征映射(SFM)。这些方法集中于增加有限训练数据的说话者和语音变化,以使得用增强数据训练的声学模型对这种变化更加鲁棒。此外,提出了一种基于堆叠体系结构的两阶段数据增强方案,以结合VTLP和SFM作为补充方法。在IARPA Babel程序的两种开发语言Assamese和Haitian Creole上进行了实验,并报告了自动语音识别(ASR)和关键字搜索(KWS)的性能得到改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号