Deep Recurrent Neural Networks in Speech Synthesis Using a Continuous Vocoder

机译：使用连续声码器的语音合成中的深度递归神经网络

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In our earlier work in statistical parametric speech synthesis, we proposed a vocoder using continuous FO in combination with Maximum Voiced Frequency (MVF), which was successfully used with a feed-forward deep neural network (DNN). The advantage of a continuous vocoder in this scenario is that vocoder parameters are simpler to model than traditional vocoders with discontinuous FO. However, DNNs have a lack of sequence modeling which might degrade the quality of synthesized speech. In order to avoid this problem, we propose the use of sequence-to-sequence modeling with recurrent neural networks (RNNs). In this paper, four neural network architectures (long short-term memory (LSTM), bidirectional LSTM (BLSTM), gated recurrent network (GRU), and standard RNN) are investigated and applied using this continuous vocoder to model FO, MVF, and Mel-Generalized Cepstrum (MGC) for more natural sounding speech synthesis. Experimental results from objective and subjective evaluations have shown that the proposed framework converges faster and gives state-of-the-art speech synthesis performance while outperforming the conventional feed-forward DNN.

机译：在统计参数语音合成的早期工作中，我们提出了一种将连续FO与最大语音频率（MVF）结合使用的声码器，该声码器已成功与前馈深度神经网络（DNN）结合使用。在这种情况下，连续声码器的优势在于，与具有不连续FO的传统声码器相比，声码器参数更易于建模。但是，DNN缺少序列建模，这可能会降低合成语音的质量。为了避免这个问题，我们建议使用带有递归神经网络（RNN）的序列到序列建模。本文研究了四种神经网络架构（长短期记忆（LSTM），双向LSTM（BLSTM），门控递归网络（GRU）和标准RNN），并使用此连续声码器对FO，MVF和梅尔通用倒谱（MGC），实现更自然的语音合成。来自主观和主观评估的实验结果表明，所提出的框架收敛速度更快，并提供了最先进的语音合成性能，同时胜过了传统的前馈DNN。

著录项

来源
《International Conference on speech and computer》|2017年|282-291|共10页
会议地点
作者
Mohammed Salah Al-Radhi; Tamas Gabor Csapo; Geza Nemeth;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Deep learning; LSTM; BLSTM; GRU; RNN;

机译：深度学习; LSTM; BLSTM; GRU; RNN;

相似文献

外文文献
中文文献
专利

1. Continuous vocoder applied in deep neural network based voice conversion [J] . Al-Radhi Mohammed Salah, Csapo Tamas Gabor, Nemeth Geza Multimedia Tools and Applications . 2019,第23期

机译：连续声码器在基于深度神经网络的语音转换中的应用
2. Multi-speaker speech synthesis and speaker adaptation based on deep bidirectional long short-term memory recurrent neural network [J] . Yi ZHAO, Nobuaki MINEMATSU, Daisuke SAITO 電子情報通信学会技術研究報告. 音声. Speech . 2015,第346期

机译：基于深度双向长短期记忆递归神经网络的多说话人语音合成与说话人自适应
3. Nonlinear enhancement of noisy speech, using continuous attractor dynamics formed in recurrent neural networks [J] . Louiza Dehyadegary, Seyyed AH Seyyedsalehi, Isar Nejadgholi Neurocomputing . 2011,第17期

机译：使用递归神经网络中形成的连续吸引子动力学来非线性增强嘈杂语音
4. Deep Recurrent Neural Networks in Speech Synthesis Using a Continuous Vocoder [C] . Mohammed Salah Al-Radhi, Tamas Gabor Csapo, Geza Nemeth International Conference on Speech and Computer . 2017

机译：使用连续声探剂的语音合成中深入复发性神经网络
5. Deep Risk: Timely Risk Scoring by a Recurrent Ensemble of Recurrent Neural Networks [D] . Nemchenko, Anton. 2018

机译：深度风险：由递归神经网络的递归集合及时评估风险
6. Use of a Deep Recurrent Neural Network to Reduce Wind Noise: Effects on Judged Speech Intelligibility and Sound Quality [O] . Mahmoud Keshavarzi, Tobias Goehring, Justin Zakis, 2018

机译：使用深度递归神经网络减少风噪声：对判断语音清晰度和声音质量的影响
7. Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System using Deep Recurrent Neural Networks [O] . Valentini Botinhao, Cassia, Wang, Xin, Takaki, Shinji, 2016

机译：使用深度递归神经网络的噪声鲁棒文本到语音合成系统的语音增强

Deep Recurrent Neural Networks in Speech Synthesis Using a Continuous Vocoder

摘要

著录项

相似文献

相关主题

期刊订阅