首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Utterance-Level Sequential Modeling for Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit
【24h】

Utterance-Level Sequential Modeling for Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit

机译:基于深层高斯过程的语音合成使用简单复发单元的话语级顺序建模

获取原文

摘要

This paper presents a deep Gaussian process (DGP) model with a recurrent architecture for speech sequence modeling. DGP is a Bayesian deep model that can be trained effectively with the consideration of model complexity and is a kernel regression model that can have high expressibility. In the previous studies, it was shown that the DGP-based speech synthesis outperformed neural network-based one, in which both models used a feed-forward architecture. To improve the naturalness of synthetic speech, in this paper, we show that DGP can be applied to utterance-level modeling using recurrent architecture models. We adopt a simple recurrent unit (SRU) for the proposed model to achieve a recurrent architecture, in which we can execute fast speech parameter generation by using the high parallelization nature of SRU. The objective and subjective evaluation results show that the proposed SRU-DGP-based speech synthesis outperforms not only feed-forward DGP but also automatically tuned SRU- and long short-term memory (LSTM)-based neural networks.
机译:本文介绍了具有用于语音序列建模的反复架构的深层高斯过程(DGP)模型。 DGP是一种贝叶斯深层模型,可以通过考虑模型复杂性有效地培训,并且是一种可以具有高表现性的内核回归模型。在以前的研究中,显示基于DGP的语音合成优于基于神经网络的基于神经网络的语音合成,其中两个模型都使用前馈架构。为了提高合成语音的自然性,在本文中,我们表明DGP可以使用经常性架构模型应用于话语级模型。我们采用简单的复发单位(SRU)来实现建议的模型,以实现经常性架构,其中我们可以使用SRU的高并行化性质来执行快速语音参数生成。目标和主观评价结果表明,所提出的基于SRU-DGP的语音合成优于前馈DGP,但也会自动调谐SRU和长期内存(LSTM)的神经网络。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号