Parameter Generation Algorithms for Text-To-Speech Synthesis with Recurrent Neural Networks

机译：递归神经网络的文本到语音合成的参数生成算法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Recurrent Neural Networks (RNN) have recently proved to be effective in acoustic modeling for TTS. Various techniques such as the Maximum Likelihood Parameter Generation (MLPG) algorithm have been naturally inherited from the HMM-based speech synthesis framework. This paper investigates in which situations parameter generation and variance restoration approaches help for RNN-based TTS. We explore how their performance is affected by various factors such as the choice of the loss function, the application of regularization methods and the amount of training data. We propose an efficient way to calculate MLPG using a convolutional kernel. Our results show that the use of the L1 loss with proper regularization outperforms any system built with the conventional L2 loss and does not require to apply MLPG (which is necessary otherwise). We did not observe perceptual improvements when embedding MLPG into the acoustic model. Finally, we show that variance restoration approaches are important for cepstral features but only yield minor perceptual gains for the prediction of F0.

机译：最近证明，递归神经网络（RNN）在TTS的声学建模中很有效。自然地从基于HMM的语音合成框架继承了各种技术，例如最大似然参数生成（MLPG）算法。本文研究了在哪些情况下参数生成和方差恢复方法有助于基于RNN的TTS。我们探讨了其性能如何受到各种因素的影响，例如损失函数的选择，正则化方法的应用和训练数据的数量。我们提出了一种使用卷积核来计算MLPG的有效方法。我们的结果表明，通过适当的正则化使用L1损失优于任何使用常规L2损失构建的系统，并且不需要应用MLPG（否则是必需的）。将MLPG嵌入声学模型时，我们没有观察到感知上的改进。最后，我们表明方差恢复方法对于倒谱特征很重要，但对于F0的预测仅产生较小的感知增益。

著录项

来源
《2018 IEEE Spoken Language Technology Workshop》|2018年|626-631|共6页
会议地点 Athens(GR)
作者
Viacheslav Klimkov; Alexis Moinet; Adam Nadolski; Thomas Drugman;
展开▼
作者单位

Amazon.com;

Amazon.com;

Amazon.com;

Amazon.com;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Acoustics; Hidden Markov models; Training; Trajectory; Covariance matrices; Kernel; Predictive models;

机译：声学;隐马尔可夫模型;训练;轨迹;协方差矩阵;核;预测模型;;

相似文献

外文文献
中文文献
专利

1. F0 Contour Modeling for Arabic Text-to-Speech Synthesis Using Fujisaki Parameters and Neural Networks [J] . Fatouma Boukadida, Noureddine Ellouze, Zied Mnasri Signal Processing: An International Journal . 2011,第6期

机译：使用Fujisaki参数和神经网络的F0轮廓建模，用于阿拉伯文本到语音的合成
2. Prosody modeling for syllable based text-to-speech synthesis using feedforward neural networks [J] . Reddy V. Ramu, Rao K. Sreenivasa Neurocomputing . 2016,第JANa1期

机译：使用前馈神经网络进行基于音节的语音合成的韵律建模
3. Two-stage intonation modeling using feedforward neural networks for syllable based text-to-speech synthesis [J] . V. Ramu Reddy, K. Sreenivasa Rao Computer speech and language . 2013,第5期

机译：使用前馈神经网络的两阶段音调建模，用于基于音节的文本到语音合成
4. Parameter Generation Algorithms for Text-To-Speech Synthesis with Recurrent Neural Networks [C] . Viacheslav Klimkov, Alexis Moinet, Adam Nadolski, Spoken Language Technology Workshop . 2018

机译：具有经常性神经网络的文本与语音合成的参数生成算法
5. Advanced architecture and training algorithms for recurrent neural networks. [D] . Cai, Xindi. 2006

机译：递归神经网络的高级架构和训练算法。
6. Predicting recurrent aphthous ulceration using genetic algorithms-optimized neural networks [O] . Najla S Dar-Odeh, Othman M Alsmadi, Faris Bakri, 2010

机译：使用遗传算法优化的神经网络预测复发性口疮性溃疡
7. Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System using Deep Recurrent Neural Networks [O] . Valentini Botinhao, Cassia, Wang, Xin, Takaki, Shinji, 2016

机译：使用深度递归神经网络的噪声鲁棒文本到语音合成系统的语音增强

Parameter Generation Algorithms for Text-To-Speech Synthesis with Recurrent Neural Networks

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅