首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Effective Spectral and Excitation Modeling Techniques for LSTM-RNN-Based Speech Synthesis Systems
【24h】

Effective Spectral and Excitation Modeling Techniques for LSTM-RNN-Based Speech Synthesis Systems

机译:基于LSTM-RNN的语音合成系统的有效频谱和激励建模技术

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we report research results on modeling the parameters of an improved time-frequency trajectory excitation (ITFTE) and spectral envelopes of an LPC vocoder with a long short-term memory (LSTM)-based recurrent neural network (RNN) for high-quality text-to-speech (TTS) systems. The ITFTE vocoder has been shown to significantly improve the perceptual quality of statistical parameter-based TTS systems in our prior works. However, a simple feed-forward deep neural network (DNN) with a finite window length is inadequate to capture the time evolution of the ITFTE parameters. We propose to use the LSTM to exploit the time-varying nature of both trajectories of the excitation and filter parameters, where the LSTM is implemented to use the linguistic text input and to predict both ITFTE and LPC parameters holistically. In the case of LPC parameters, we further enhance the generated spectrum by applying LP bandwidth expansion and line spectral frequency-sharpening filters. These filters are not only beneficial for reducing unstable synthesis filter conditions but also advantageous toward minimizing the muffling problem in the generated spectrum. Experimental results have shown that the proposed LSTM-RNN system with the ITFTE vocoder significantly outperforms both similarly configured band aperiodicity-based systems and our best prior DNN-trainecounterpart, both objectively and subjectively.
机译:在本文中,我们报告了对具有改进的时频轨迹激励(ITFTE)参数和具有长短期记忆(LSTM)的递归神经网络(RNN)的LPC声码器的频谱包络进行建模的研究结果,质量的语音合成(TTS)系统。在我们以前的工作中,已经显示出ITFTE声码器可以显着提高基于统计参数的TTS系统的感知质量。但是,具有有限窗口长度的简单前馈深度神经网络(DNN)不足以捕获ITFTE参数的时间演变。我们建议使用LSTM来利用激励和滤波器参数轨迹的时变性质,其中LSTM的实现是使用语言文本输入并整体预测ITFTE和LPC参数。对于LPC参数,我们通过应用LP带宽扩展和线频谱频率锐化滤波器来进一步增强生成的频谱。这些滤波器不仅有利于减少不稳定的合成滤波器条件,而且有利于最小化所产生频谱中的消声问题。实验结果表明,在客观和主观方面,带有ITFTE声码器的拟议LSTM-RNN系统在性能上均明显优于基于类似配置的基于带非周期性的系统和我们最好的现有DNN训练系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号