Fast speaker adaptation using extended diagonal linear transformation for deep neural networks

Donghyun Kim; Sanghun Kim

首页> 外文期刊>ETRI journal >Fast speaker adaptation using extended diagonal linear transformation for deep neural networks

【24h】

Fast speaker adaptation using extended diagonal linear transformation for deep neural networks

机译：使用扩展对角线性变换的深度神经网络快速说话人自适应

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper explores new techniques that are based on a hidden‐layer linear transformation for fast speaker adaptation used in deep neural networks ( DNN s). Conventional methods using affine transformations are ineffective because they require a relatively large number of parameters to perform. Meanwhile, methods that employ singular‐value decomposition ( SVD ) are utilized because they are effective at reducing adaptive parameters. However, a matrix decomposition is computationally expensive when using online services. We propose the use of an extended diagonal linear transformation method to minimize adaptation parameters without SVD to increase the performance level for tasks that require smaller degrees of adaptation. In Korean large vocabulary continuous speech recognition ( LVCSR ) tasks, the proposed method shows significant improvements with error‐reduction rates of 8.4% and 17.1% in five and 50 conversational sentence adaptations, respectively. Compared with the adaptation methods using SVD , there is an increased recognition performance with fewer parameters.

机译：本文探索了基于隐层线性变换的新技术，该技术可用于深度神经网络（DNN）中的快速说话人自适应。使用仿射变换的常规方法无效，因为它们需要相对大量的参数来执行。同时，使用了采用奇异值分解（SVD）的方法，因为它们有效地减少了自适应参数。但是，使用在线服务时，矩阵分解的计算量很大。我们建议使用扩展对角线线性变换方法来在不使用SVD的情况下最小化适应参数，以提高需要较小适应度的任务的性能水平。在韩国的大词汇量连续语音识别（LVCSR）任务中，所提出的方法显示出显着的改进，在五个和50个会话句子改编中，错误减少率分别为8.4％和17.1％。与使用SVD的自适应方法相比，具有更少参数的识别性能得以提高。

著录项

来源
《ETRI journal》 |2019年第1期|共8页
作者
Donghyun Kim; Sanghun Kim;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词
DNNacoustic modelingDNNadaptationDNNspeech recognitionfast speaker adaptationSVDadaptation;

机译：DNN声学建模DNN自适应DNN语音识别快速说话人自适应SV自适应;

相似文献

外文文献
中文文献
专利

1. Multi-speaker speech synthesis and speaker adaptation based on deep bidirectional long short-term memory recurrent neural network [J] . Yi ZHAO, Nobuaki MINEMATSU, Daisuke SAITO 電子情報通信学会技術研究報告. 音声. Speech . 2015,第346期

机译：基于深度双向长短期记忆递归神经网络的多说话人语音合成与说话人自适应
2. Hierarchical Bayesian combination of plug-in maximum a posteriori decoders in deep neural networks-based speech recognition and speaker adaptation [J] . Huang Zhen, Siniscalchi Sabato Marco, Lee Chin-Hui Pattern recognition letters . 2017,第octa15期

机译：基于深度神经网络的语音识别和说话人自适应的插件最大后验解码器的分层贝叶斯组合
3. A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition [J] . Huang Zhen, Siniscalchi Sabato Marco, Lee Chin-Hui Neurocomputing . 2016,第DECa19期

机译：深度神经网络转移学习的统一方法及其在自动语音识别中的说话人自适应中的应用
4. Extended low-rank plus diagonal adaptation for deep and recurrent neural networks [C] . Yong Zhao, Jinyu Li, Kshitiz Kumar, IEEE International Conference on Acoustics, Speech and Signal Processing . 2017

机译：用于深度和递归神经网络的扩展低秩加对角线适应
5. Deep Neural Network Based Speaker Verification Under Domain Mismatched Conditions [D] . Zhang, Chunlei. 2019

机译：基于深度神经网络的扬声器验证在域不匹配条件下
6. Speaker-dependent multipitch tracking using deep neural networks [O] . Yuzhou Liu, DeLiang Wang -1

机译：使用深度神经网络的说话人相关多音高跟踪
7. SPEAKER ADAPTATION OF CONTEXT DEPENDENT DEEP NEURAL NETWORKS [O] . Hank Liao 2013

机译：声音适应背景相关的深层神经网络

Fast speaker adaptation using extended diagonal linear transformation for deep neural networks

摘要

著录项

相似文献

相关主题

期刊订阅