Semi-Supervised Learning Based on Hierarchical Generative Models for End-to-End Speech Synthesis

机译：基于分层生成模型的半监督学习，用于端到端语音合成

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper proposes a general framework of semi-supervised learning based on hierarchical generative models and adapts it to a Japanese end-to-end text-to-speech (TTS) system. In English TTS, several end-to-end systems have recently achieved sound quality close to that of natural human speech. However, in non-alphabetic languages such as Japanese, it is difficult to realize true text-input end-to-end TTS due to character diversity and pitch accents. To address this problem, we propose end-to-end TTS based on semi-supervised learning that makes the most of existing data consisting of any combination of text, phoneme, and waveform as training data. To demonstrate the effectiveness of the proposed system, listening tests were conducted for pronunciation and naturalness. Our results show that the proposed system improves both pronunciation and naturalness.

机译：本文提出了一种基于分层生成模型的半监督学习的通用框架，并将其适应于日语的端到端文本转语音（TTS）系统。在英语TTS中，最近有几种端到端系统已经达到了接近自然人语音的音质。但是，在诸如日语的非字母语言中，由于字符多样性和音高变音，难以实现真正的文本输入端到端TTS。为了解决此问题，我们提出了基于半监督学习的端到端TTS，该学习将大部分由文本，音素和波形的任意组合组成的现有数据用作训练数据。为了证明所提出系统的有效性，进行了针对发音和自然性的听力测试。我们的结果表明，所提出的系统提高了发音和自然度。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2020年|7644-7648|共5页
会议地点
作者
Takato Fujimoto; Shinji Takaki; Kei Hashimoto; Keiichiro Oura; Yoshihiko Nankaku; Keiichi Tokuda;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
End-to-end speech synthesis; semi-supervised learning; hierarchical generative model; variational auto-encoder; Japanese speech synthesis;

机译：端到端语音合成;半监督学习;分层生成模型;变分自动编码器;日语语音合成;

相似文献

外文文献
中文文献
专利

1. Transductive active learning - A new semi-supervised learning approach based on iteratively refined generative models to capture structure in data [J] . Reitmaier Tobias, Calma Adrian, Sick Bernhard Information Sciences: An International Journal . 2015,第Null期

机译：过渡式主动学习-一种基于迭代细化生成模型的新半监督学习方法，可捕获数据结构
2. Probabilistic Representation and Inverse Design of Metamaterials Based on a Deep Generative Model with Semi-Supervised Learning Strategy [J] . Ma Wei, Cheng Feng, Xu Yihao, Advanced Materials . 2019,第35期

机译：基于半监督学习策略的深度生成模型的超材料的概率表示和逆设计
3. Probabilistic Representation and Inverse Design of Metamaterials Based on a Deep Generative Model with Semi-Supervised Learning Strategy [J] . Ma Wei, Cheng Feng, Xu Yihao, Advanced Materials . 2019,第35期

机译：基于半监督学习策略的深度生成模型基于深度生成模型的超重要性概率表示及逆设计
4. Semi-Supervised Learning Based on Hierarchical Generative Models for End-to-End Speech Synthesis [C] . Takato Fujimoto, Shinji Takaki, Kei Hashimoto, IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：基于分层生成模型的半监督学习，用于端到端语音合成
5. Graph-based Semi-Supervised Learning in Acoustic Modeling for Automatic Speech Recognition. [D] . Liu, Yuzong. 2016

机译：用于自动语音识别的声学建模中基于图的半监督学习。
6. A Bayesian generative model for learning semantic hierarchies [O] . Roni Mittelman, Min Sun, Benjamin Kuipers, 2014

机译：用于语义层次学习的贝叶斯生成模型
7. End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training [O] . Pengfei Wu, Zhenhua Ling, Lijuan Liu, 2019

机译：使用风格代币和半监督培训结束的情绪语音合成

Semi-Supervised Learning Based on Hierarchical Generative Models for End-to-End Speech Synthesis

摘要

著录项

相似文献

相关主题

期刊订阅