首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >A COMPARISON OF RECENT WAVEFORM GENERATION AND ACOUSTIC MODELING METHODS FOR NEURAL-NETWORK-BASED SPEECH SYNTHESIS
【24h】

A COMPARISON OF RECENT WAVEFORM GENERATION AND ACOUSTIC MODELING METHODS FOR NEURAL-NETWORK-BASED SPEECH SYNTHESIS

机译:基于神经网络的语音合成近期波形生成和声学建模方法的比较

获取原文

摘要

Recent advances in speech synthesis suggest that limitations such as the lossy nature of the amplitude spectrum with minimum phase approximation and the over-smoothing effect in acoustic modeling can be overcome by using advanced machine learning approaches. In this paper, we build a framework in which we can fairly compare new vocoding and acoustic modeling techniques with conventional approaches by means of a large scale crowdsourced evaluation. Results on acoustic models showed that generative adversarial networks and an autoregressive (AR) model performed better than a normal recurrent network and the AR model performed best. Evaluation on vocoders by using the same AR acoustic model demonstrated that a Wavenet vocoder outperformed classical source-filter-based vocoders. Particularly, generated speech waveforms from the combination of AR acoustic model and Wavenet vocoder achieved a similar score of speech quality to vocoded speech.
机译:语音合成的最新进展表明,通过使用先进的机器学习方法,可以克服幅度谱的幅度谱的有损性质等限制,以及声学建模中的过平滑效果。在本文中,我们建立了一个框架,其中我们可以通过大规模众群评估公平地比较具有传统方法的新的声音和声学建模技术。声学模型的结果表明,生成的对抗网络和自回归(AR)模型比正常的复发网络和AR模型更好地执行。使用相同的AR声学模型对声码器的评估表明,Wavenet声码器超越了基于古典源滤波器的声码器。特别地,来自AR声学模型和WVENET VOCODER的组合的生成的语音波形实现了与声音语音相似的语音质量分数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号