首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Statistical Parametric Speech Synthesis Using Deep Gaussian Processes
【24h】

Statistical Parametric Speech Synthesis Using Deep Gaussian Processes

机译:使用深度高斯过程的统计参数语音合成

获取原文
获取原文并翻译 | 示例

摘要

This paper proposes a framework of speech synthesis based on deep Gaussian processes (DGPs), which is a deep architecture model composed of stacked Bayesian kernel regressions. In this method, we train a statistical model of transformation from contextual features to speech parameters in a similar manner to deep neural network (DNN)-based speech synthesis. To apply DGPs to a statistical parametric speech synthesis framework, our framework uses an approximation method, doubly stochastic variational inference, which is suitable for an arbitrary amount of data. Since the training of DGPs is based on the marginal likelihood that takes into account not only data fitting, but also model complexity, DGPs are less vulnerable to overfitting compared with DNNs. In experimental evaluations, we investigated a performance comparison of the proposed DGP-based framework with a feedforward DNN-based one. Subjective and objective evaluation results showed that our DGP framework yielded a higher mean opinion score and lower acoustic feature distortions than the conventional framework.
机译:本文提出了一种基于深度高斯过程(DGP)的语音合成框架,该框架是由堆叠贝叶斯核回归组成的深度体系结构模型。在这种方法中,我们以与基于深度神经网络(DNN)的语音合成类似的方式训练从上下文特征到语音参数的转换统计模型。要将DGP应用于统计参量语音合成框架,我们的框架使用近似方法,即双重随机变分推理,适用于任意数量的数据。由于DGP的训练基于不仅考虑数据拟合而且考虑模型复杂性的边际可能性,因此与DNN相比,DGP不太容易过拟合。在实验评估中,我们调查了建议的基于DGP的框架与基于前馈DNN的框架的性能比较。主观和客观评估结果表明,与传统框架相比,我们的DGP框架产生了更高的平均意见得分和更低的声学特征失真。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号