首页> 外文会议>International Conference on Pattern Recognition >Speaker-aware Multi-Task Learning for automatic speech recognition
【24h】

Speaker-aware Multi-Task Learning for automatic speech recognition

机译:说话者感知的多任务学习,可自动识别语音

获取原文

摘要

Overfitting is a commonly met issue in automatic speech recognition and is especially impacting when the amount of training data is limited. In order to address this problem, this article investigates acoustic modeling through Multi-Task Learning, with two speaker-related auxiliary tasks. Multi-Task Learning is a regularization method which aims at improving the network's generalization ability, by training a unique model to solve several different, but related tasks. In this article, two auxiliary tasks are jointly examined. On the one hand, we consider speaker classification as an auxiliary task by training the acoustic model to recognize the speaker, or find the closest one inside the training set. On the other hand, the acoustic model is also trained to extract i-vectors from the standard acoustic features. I-Vectors are efficiently applied in the speaker identification community in order to characterize a speaker and its acoustic environment. The core idea of using these auxiliary tasks is to give the network an additional inter-speaker awareness, and thus, reduce overfitting.We investigate this Multi-Task Learning setup on the TIMIT database, while the acoustic modeling is performed using a Recurrent Neural Network with Long Short-Term Memory cells.
机译:过度拟合是自动语音识别中经常遇到的问题,特别是在训练数据量有限的情况下会产生影响。为了解决这个问题,本文研究了通过多任务学习和两个与说话者相关的辅助任务进行的声学建模。多任务学习是一种正则化方法,旨在通过训练一个独特的模型来解决若干不同但相关的任务来提高网络的泛化能力。本文将共同研究两个辅助任务。一方面,我们通过训练声学模型以识别说话者或在训练集中找到最接近的说话者,将说话者分类视为辅助任务。另一方面,还对声学模型进行了训练,以从标准声学特征中提取i矢量。 I矢量可以有效地应用于说话人识别社区,以表征说话人及其声学环境。使用这些辅助任务的核心思想是使网络具有更多的扬声器间意识,从而减少过度拟合。我们在TIMIT数据库上研究了这种多任务学习设置,而声学模型是使用递归神经网络执行的。具有长短期记忆单元。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号