首页> 外文会议>IEEE Workshop on Spoken Language Technology >I-Vector estimation as auxiliary task for Multi-Task Learning based acoustic modeling for automatic speech recognition

【24h】

I-Vector estimation as auxiliary task for Multi-Task Learning based acoustic modeling for automatic speech recognition

机译：I矢量估计作为基于多任务学习的自动语音识别声学模型的辅助任务

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

I-Vectors have been successfully applied in the speaker identification community in order to characterize the speaker and its acoustic environment. Recently, i-vectors have also shown their usefulness in automatic speech recognition, when concatenated to standard acoustic features. Instead of directly feeding the acoustic model with i-vectors, we here investigate a Multi-Task Learning approach, where a neural network is trained to simultaneously recognize the phone-state posterior probabilities and extract i-vectors, using the standard acoustic features. Multi-Task Learning is a regularization method which aims at improving the network's generalization ability, by training a unique network to solve several different, but related tasks. The core idea of using i-vector extraction as an auxiliary task is to give the network an additional inter-speaker awareness, and thus, reduce overfitting. Overfitting is a commonly met issue in speech recognition and is especially impacting when the amount of training data is limited. The proposed setup is trained and tested on the TIMIT database, while the acoustic modeling is performed using a Recurrent Neural Network with Long Short-Term Memory cells.

机译：I-Vectors已成功应用于演讲者识别社区，以表征演讲者及其声学环境。最近，当与标准声学特征连接时，i向量也显示了其在自动语音识别中的有用性。此处，我们没有研究使用i向量直接输入声学模型的方法，而是研究了一种多任务学习方法，该方法训练了神经网络以使用标准声学特征同时识别电话状态的后验概率并提取i向量。多任务学习是一种正则化方法，旨在通过训练一个独特的网络来解决若干不同但相关的任务来提高网络的泛化能力。使用i矢量提取作为辅助任务的核心思想是使网络具有额外的扬声器间感知能力，从而减少过度拟合。过度拟合是语音识别中经常遇到的问题，特别是在训练数据量有限的情况下会产生影响。拟议的设置在TIMIT数据库上进行了培训和测试，而声学建模是使用带有长短期记忆单元的递归神经网络执行的。

著录项

来源
《IEEE Workshop on Spoken Language Technology》|2016年|1-7|共7页
会议地点
作者
Gueorgui Pironkov; Stéphane Dupont; Thierry Dutoit;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Acoustics; Training; Feature extraction; Speech recognition; Speech; Standards; Machine learning;

机译：声学;训练;特征提取;语音识别;语音;标准;机器学习;

相似文献

外文文献
中文文献
专利

1. Graph-Based Semisupervised Learning for Acoustic Modeling in Automatic Speech Recognition [J] . Yuzong Liu, Katrin Kirchhoff Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2016,第11期

机译：基于图的半监督学习在自动语音识别中的声学建模
2. Coarse-to-Fine Speech Emotion Recognition Based on Multi-Task Learning [J] . Zhao Huijuan, Ye Ning, Wang Ruchuan Journal of signal processing systems for signal, image, and video technology . 2021,第2a3期

机译：基于多任务学习的粗致良好的语音情感识别
3. Visual Focus of Attention and Spontaneous Smile Recognition Based on Continuous Head Pose Estimation by Cascaded Multi-Task Learning [J] . Liu Yuanyuan, Li Xingmei, Fang Fang, International Journal of Pattern Recognition and Artificial Intelligence . 2019,第7期

机译：基于级联多任务学习的连续头部姿势估计的注意力和自发微笑识别的视觉焦点
4. I-Vector estimation as auxiliary task for Multi-Task Learning based acoustic modeling for automatic speech recognition [C] . Gueorgui Pironkov, Stéphane Dupont, Thierry Dutoit IEEE Workshop on Spoken Language Technology . 2016

机译：基于多任务学习的自动语音模型的I - 矢量估计作为基于多任务学习的声学建模的辅助任务
5. Multi-task learning deep neural networks for automatic speech recognition [D] . Chen, Dongpeng. 2015

机译：多任务学习深度神经网络自动语音识别
6. Clustered Multi-Task Learning for Automatic Radar Target Recognition [O] . Cong Li, Weimin Bao, Luping Xu, 2017

机译：集群式多任务学习用于雷达目标自动识别
7. Efficient Acoustic Modeling Method for Unsupervised Speech Recognition using Multi-Task Deep Neural Network [O] . Haitao Yao, Maobo An, Ji Xu, 2016

机译：使用多任务深神经网络的无监督语音识别有效的声学建模方法

I-Vector estimation as auxiliary task for Multi-Task Learning based acoustic modeling for automatic speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅