首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Investigation of Fast and Efficient Methods for Multi-Speaker Modeling and Speaker Adaptation

【24h】

Investigation of Fast and Efficient Methods for Multi-Speaker Modeling and Speaker Adaptation

机译：多扬声器建模与扬声器适应快速高效的研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose a novel method for fast and efficient few-shot TTS task, which is able to disentangle linguistic and speaker representations. Specifically, an adversarial training strategy is firstly employed to wipe out speaker information from the linguistic representations. Then the speaker representations are extracted from audio signals by a speaker encoder with a random sampling mechanism and a speaker classifier, aiming to extract speaker embedding features that are independent of content information (such as prosody and style etc). Meanwhile, for faster and efficient adaptation, we further introduce the prior alignment knowledge between the text and audio pairs and propose a multi-alignment guided attention to help the attention learning. The Experimental results show the proposed method not only could generate higher speech quality and speaker similarity with an average absolute improvement of 0.26 and 0.30 in MOS respectively, when adapting to new speakers with 20 utterances, but also converge much faster and efficient. More-over, we can achieve a MOS of 4.45 for a premium voice, which outperforms a single speaker model of 4.23. ¹

机译：在本文中，我们提出了一种新颖的快速有效的少量TTS任务，能够解开语言和扬声器表示。具体而言，首先采用对抗语培训策略来消除语言表征的发言人信息。然后，扬声器表示由扬声器编码器用具有随机采样机制和扬声器分类器的音频信号提取，旨在提取独立于内容信息（例如韵律和风格等）的扬声器嵌入特征。同时，为了更快，更高效的适应，我们进一步介绍了文本和音频对之间的先前对准知识，并提出了一种多对准的引导，以帮助注意学习。实验结果表明，在适应具有20个话语的新扬声器的情况下，拟议的方法不仅可以产生更高的语音质量和扬声器相似性，而且分别在MOS中的平均绝对改善0.26和0.30。更多，我们可以实现4.45的MOS，以获得优质的声音，这优于一个4.23的单个扬声器型号。¹

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing 》|2021年|6618-6622|共5页
会议地点
作者
Yibin Zheng; Xinhui Li; Li Lu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Adaptation models; Conferences; Linguistics; Signal processing; Feature extraction; Stability analysis;

机译：培训;适应模型;会议;语言学;信号处理;特征提取;稳定性分析;

相似文献

外文文献
中文文献
专利

1. A fast maximum likelihood nonlinear feature transformation method for GMM-HMM speaker adaptation [J] . Kaisheng Yao, Dong Yu, Li Deng, Neurocomputing . 2014 ,第mara27期

机译：GMM-HMM说话人自适应的快速最大似然非线性特征变换方法
2. Rapid discriminative acoustic model based on eigenspace mapping for fast speaker adaptation [J] . Bowen Zhou, Hansen J.H.L. IEEE Transactions on Speech and Audio Proceessing . 2005 ,第4期

机译：基于特征空间映射的快速判别声学模型，用于说话人快速适应
3. Fast model selection based speaker adaptation for nonnative speech [J] . Xiaodong He, Yunxin Zhao IEEE Transactions on Speech and Audio Proceessing . 2003 ,第4期

机译：基于快速模型选择的非母语语音说话人自适应
4. Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis [C] . Fan Yuchen, Qian Yao, Soong Frank K., IEEE International Conference on Acoustics, Speech and Signal Processing . 2015

机译：基于DNN的TTS综合的多扬声器建模和扬声器自适应
5. Speaker Characteristic-based Acoustic Model Adaptation Method for Speaker Recognition Systems [D] . Millington, Daniel S. 2011

机译：基于说话者特征的说话人识别系统声学模型自适应方法
6. Fast accurate photon beam accelerator modeling using BEAMnrc: A systematic investigation of efficiency enhancing methods and cross-section data [O] . Margarida Fragoso, Iwan Kawrakow, Bruce A. Faddegon, -1

机译：使用BEAMnrc进行快速准确的光子束加速器建模：效率提高方法和横截面数据的系统研究
7. A New Adaptation Method for Speaker-Model Creation in High-Level Speaker Verification [O] . Shi-xiong Zhang, Man-wai Mak 2008

机译：高级说话人验证中说话人模型创建的一种新的自适应方法

Investigation of Fast and Efficient Methods for Multi-Speaker Modeling and Speaker Adaptation

摘要

著录项

相似文献

相关主题

期刊订阅