首页> 外文会议>2017 IEEE Automatic Speech Recognition and Understanding Workshop >Statistical parametric speech synthesis using generative adversarial networks under a multi-task learning framework
【24h】

Statistical parametric speech synthesis using generative adversarial networks under a multi-task learning framework

机译:在多任务学习框架下使用生成对抗网络进行统计参数语音合成

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we aim at improving the performance of synthesized speech in statistical parametric speech synthesis (SPSS) based on a generative adversarial network (GAN). In particular, we propose a novel architecture combining the traditional acoustic loss function and the GAN's discriminative loss under a multi-task learning (MTL) framework. The mean squared error (MSE) is usually used to estimate the parameters of deep neural networks, which only considers the numerical difference between the raw audio and the synthesized one. To mitigate this problem, we introduce the GAN as a second task to determine if the input is a natural speech with specific conditions. In this MTL framework, the MSE optimization improves the stability of GAN, and at the same time GAN produces samples with a distribution closer to natural speech. Listening tests show that the multi-task architecture can generate more natural speech that satisfies human perception than the conventional methods.
机译:在本文中,我们旨在提高基于生成对抗网络(GAN)的统计参数语音合成(SPSS)中合成语音的性能。特别是,我们提出了一种新颖的体系结构,在多任务学习(MTL)框架下将传统的声学损耗函数与GAN的判别损耗相结合。均方误差(MSE)通常用于估计深度神经网络的参数,该参数仅考虑原始音频和合成音频之间的数值差异。为了缓解这个问题,我们将GAN作为第二项任务来确定输入是否为具有特定条件的自然语音。在此MTL框架中,MSE优化提高了GAN的稳定性,同时GAN生成的样本的分布更接近自然语音。听力测试表明,与传统方法相比,多任务体系结构可以生成更自然的语音,满足人类的感知。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号