首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Semi-Supervised Learning Based on Hierarchical Generative Models for End-to-End Speech Synthesis
【24h】

Semi-Supervised Learning Based on Hierarchical Generative Models for End-to-End Speech Synthesis

机译:基于分层生成模型的半监督学习,用于端到端语音合成

获取原文

摘要

This paper proposes a general framework of semi-supervised learning based on hierarchical generative models and adapts it to a Japanese end-to-end text-to-speech (TTS) system. In English TTS, several end-to-end systems have recently achieved sound quality close to that of natural human speech. However, in non-alphabetic languages such as Japanese, it is difficult to realize true text-input end-to-end TTS due to character diversity and pitch accents. To address this problem, we propose end-to-end TTS based on semi-supervised learning that makes the most of existing data consisting of any combination of text, phoneme, and waveform as training data. To demonstrate the effectiveness of the proposed system, listening tests were conducted for pronunciation and naturalness. Our results show that the proposed system improves both pronunciation and naturalness.
机译:本文提出了一种基于分层生成模型的半监督学习的通用框架,并将其适应于日语的端到端文本转语音(TTS)系统。在英语TTS中,最近有几种端到端系统已经达到了接近自然人语音的音质。但是,在诸如日语的非字母语言中,由于字符多样性和音高变音,难以实现真正的文本输入端到端TTS。为了解决此问题,我们提出了基于半监督学习的端到端TTS,该学习将大部分由文本,音素和波形的任意组合组成的现有数据用作训练数据。为了证明所提出系统的有效性,进行了针对发音和自然性的听力测试。我们的结果表明,所提出的系统提高了发音和自然度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号