Deep Recurrent Neural Networks in Speech Synthesis Using a Continuous Vocoder

机译：使用连续声探剂的语音合成中深入复发性神经网络

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In our earlier work in statistical parametric speech synthesis, we proposed a vocoder using continuous FO in combination with Maximum Voiced Frequency (MVF), which was successfully used with a feed-forward deep neural network (DNN). The advantage of a continuous vocoder in this scenario is that vocoder parameters are simpler to model than traditional vocoders with discontinuous FO. However, DNNs have a lack of sequence modeling which might degrade the quality of synthesized speech. In order to avoid this problem, we propose the use of sequence-to-sequence modeling with recurrent neural networks (RNNs). In this paper, four neural network architectures (long short-term memory (LSTM), bidirectional LSTM (BLSTM), gated recurrent network (GRU), and standard RNN) are investigated and applied using this continuous vocoder to model FO, MVF, and Mel-Generalized Cepstrum (MGC) for more natural sounding speech synthesis. Experimental results from objective and subjective evaluations have shown that the proposed framework converges faster and gives state-of-the-art speech synthesis performance while outperforming the conventional feed-forward DNN.

机译：在我们前面的统计参数语音合成工作中，我们提出了组合使用连续FO与最大浊音频率（MVF），这是成功地与前馈深层神经网络（DNN）使用的声码器。在这种情况下连续声码器的优点是，声码器的参数是简单的模型比连续FO传统的声码器。然而，DNNs都缺少序列建模这可能会降低合成语音的质量。为了避免这个问题，我们提出用递归神经网络（RNNs）使用序列到序列建模。在本文中，四层神经网络结构（长短期存储器（LSTM），双向LSTM（BLSTM），门控复发性网络（GRU），和标准RNN）进行调查，并使用这个连续声码器FO，MVF建模应用，并梅尔倒谱广义（MGC）更自然的发声语音合成。从客观和主观评价实验结果表明，所提出的框架收敛速度更快，并给出国家的最先进的语音合成的性能，同时表现好于常规的前馈DNN。

著录项

来源
《International Conference on Speech and Computer》|2017年|831p|共10页
会议地点
作者
Mohammed Salah Al-Radhi; Tamas Gabor Csapo; Geza Nemeth;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词
Deep learning; LSTM; BLSTM; GRU; RNN;

机译：深入学习;LSTM;BLSTM;GRU;RNN;

相似文献

外文文献
中文文献
专利

1. Continuous vocoder applied in deep neural network based voice conversion [J] . Al-Radhi Mohammed Salah, Csapo Tamas Gabor, Nemeth Geza Multimedia Tools and Applications . 2019,第23期

机译：连续声码器在基于深度神经网络的语音转换中的应用
2. Multi-speaker speech synthesis and speaker adaptation based on deep bidirectional long short-term memory recurrent neural network [J] . Yi ZHAO, Nobuaki MINEMATSU, Daisuke SAITO 電子情報通信学会技術研究報告. 音声. Speech . 2015,第346期

机译：基于深度双向长短期记忆递归神经网络的多说话人语音合成与说话人自适应
3. Nonlinear enhancement of noisy speech, using continuous attractor dynamics formed in recurrent neural networks [J] . Louiza Dehyadegary, Seyyed AH Seyyedsalehi, Isar Nejadgholi Neurocomputing . 2011,第17期

机译：使用递归神经网络中形成的连续吸引子动力学来非线性增强嘈杂语音
4. Deep Recurrent Neural Networks in Speech Synthesis Using a Continuous Vocoder [C] . Mohammed Salah Al-Radhi, Tamas Gabor Csapo, Geza Nemeth International Conference on speech and computer . 2017

机译：使用连续声码器的语音合成中的深度递归神经网络
5. Deep Risk: Timely Risk Scoring by a Recurrent Ensemble of Recurrent Neural Networks [D] . Nemchenko, Anton. 2018

机译：深度风险：由递归神经网络的递归集合及时评估风险
6. Use of a Deep Recurrent Neural Network to Reduce Wind Noise: Effects on Judged Speech Intelligibility and Sound Quality [O] . Mahmoud Keshavarzi, Tobias Goehring, Justin Zakis, 2018

机译：使用深度递归神经网络减少风噪声：对判断语音清晰度和声音质量的影响
7. Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System using Deep Recurrent Neural Networks [O] . Valentini Botinhao, Cassia, Wang, Xin, Takaki, Shinji, 2016

机译：使用深度递归神经网络的噪声鲁棒文本到语音合成系统的语音增强

Deep Recurrent Neural Networks in Speech Synthesis Using a Continuous Vocoder

摘要

著录项

相似文献

相关主题

期刊订阅