首页> 外文会议>2018 IEEE International Work Conference on Bioinspired Intelligence >Pre-training Long Short-term Memory Neural Networks for Efficient Regression in Artificial Speech Postfiltering
【24h】

Pre-training Long Short-term Memory Neural Networks for Efficient Regression in Artificial Speech Postfiltering

机译:预训练长时短时记忆神经网络,以实现人工语音后置滤波中的有效回归

获取原文
获取原文并翻译 | 示例

摘要

Several attempts to enhance statistical parametric speech synthesis have contemplated deep-learning-based postfilters, which learn to perform a mapping of the synthetic speech parameters to the natural ones, reducing the gap between them. In this paper, we introduce a new pre-training approach for neural networks, applied in LSTM-based postfilters for speech synthesis, with the objective of enhancing the quality of the synthesized speech in a more efficient manner. Our approach begins with an auto-regressive training of one LSTM network, whose is used as an initialization for postfilters based on a denoising autoencoder architecture. We show the advantages of this initialization on a set of multi-stream postfilters, which encompass a collection of denoising autoencoders for the set of MFCC and fundamental frequency parameters of the artificial voice. Results show that the initialization succeeds in lowering the training time of the LSTM networks and achieves better results in enhancing the statistical parametric speech in most cases, when compared to the common random-initialized approach of the networks.
机译:增强统计参数语音合成的几种尝试已经考虑了基于深度学习的后置滤波器,该后置滤波器学习执行合成语音参数到自然参数的映射,从而减小它们之间的差距。在本文中,我们介绍了一种新的神经网络预训练方法,该方法在基于LSTM的后置滤波器中用于语音合成,目的是以更有效的方式提高合成语音的质量。我们的方法从对一个LSTM网络进行自动回归训练开始,该网络被用作基于降噪自动编码器体系结构的后置滤波器的初始化。我们在一组多流后置滤波器上显示了这种初始化的优点,该滤波器包含一组MFCC和人工语音的基本频率参数的去噪自动编码器集合。结果表明,与常见的网络随机初始化方法相比,初始化成功地减少了LSTM网络的训练时间,并且在增强大多数情况下的统计参数语音方面取得了更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号