TTS-by-TTS: TTS-Driven Data Augmentation for Fast and High-Quality Speech Synthesis

机译：TTS-BY-TTS：TTS驱动的数据增强用于快速和高质量的语音合成

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose a text-to-speech (TTS)-driven data augmentation method for improving the quality of a non-autoregressive (AR) TTS system. Recently proposed non-AR models, such as FastSpeech 2, have successfully achieved fast speech synthesis system. However, their quality is not satisfactory, especially when the amount of training data is insufficient. To address this problem, we propose an effective data augmentation method using a well-designed AR TTS system. In this method, large-scale synthetic corpora including text-waveform pairs with phoneme duration are generated by the AR TTS system, and then used to train the target non-AR model. Perceptual listening test results showed that the proposed method significantly improved the quality of the non-AR TTS system. In particular, we augmented five hours of a training database to 179 hours of a synthetic one. Using these databases, our TTS system consisting of a FastSpeech 2 acoustic model with a Parallel WaveGAN vocoder achieved a mean opinion score of 3.74, which is 40% higher than that achieved by the conventional method.

机译：在本文中，我们提出了一种文本 - 语音（TTS）驱动的数据增强方法，用于提高非自动增加（AR）TTS系统的质量。最近提出的非AR模型，如FastSeech 2，已成功实现了快速语音合成系统。然而，他们的质量并不令人满意，特别是当训练数据的数量不足时。为了解决这个问题，我们提出了一种使用精心设计的AR TTS系统的有效数据增强方法。在这种方法中，包括AR TTS系统生成具有音素持续时间的文本波形对的大规模合成语料库，然后用于训练目标非AR模型。感知听力测试结果表明，该方法显着提高了非AR TTS系统的质量。特别是，我们将五个小时的培训数据库增强到合成的179小时。使用这些数据库，我们的TTS系统由具有并行Wravan Vocoder的FastSeech 2声学模型组成，达到3.74的平均意见分数，比通过传统方法实现的40％。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2021年|6598-6602|共5页
会议地点
作者
Min-Jae Hwang; Ryuichi Yamamoto; Eunwoo Song; Jae-Min Kim;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Databases; Vocoders; Conferences; Training data; Signal processing; Acoustics;

机译：培训;数据库;声码器;会议;训练数据;信号处理;声学;

相似文献

外文文献
中文文献
专利

1. Multiple-Prosody Speech Databases and Their Effectiveness in High-Quality Speech Synthesis at Arbitrary Rates [J] . Tsuyoshi Masuda, Tomoki Toda, Hiromichi Kawanami, Electronics and Communications in Japan. Part 2, Electronics . 2005,第9期

机译：多韵律语音数据库及其在任意速率下高质量语音合成中的有效性
2. Data Augmentation Using Virtual Microphone Array Synthesis and Multi-Resolution Feature Extraction for Isolated Word Dysarthric Speech Recognition [J] . Celin T. A. Mariya, Nagarajan T., Vijayalakshmi P. Selected Topics in Signal Processing, IEEE Journal of . 2020,第2期

机译：使用虚拟麦克风阵列综合和多分辨率特征提取的数据增强用于隔离字发育arthric语音识别
3. 4:Yb3+,Ln3+ Short Nanorods]]> [J] . Fabrizio Guzzetta, Anna Roig, Beatriz Julián-López Journal of physical chemistry letters . 2017,第23期

机译：<！[CDATA [超快合成和高质量β-NAYF _{4 ：YB 3 + ，Ln 3 + 短纳米棒]] >}
4. Multi-speaker Sequence-to-sequence Speech Synthesis for Data Augmentation in Acoustic-to-word Speech Recognition [C] . Sei Ueno, Masato Mimura, Shinsuke Sakai, IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：多说话人序列语音合成技术在语音到语音识别中的数据增强
5. High-quality enhanced waveform interpolative coding of speech at low bit-rate [D] . Gottesman, Oded 2000

机译：低比特率语音的高质量增强波形内插编码
6. Fast and Cost-Effective Synthesis of High-Quality Graphene on Copper Foils Using High-Current Arc Evaporation [O] . Helge Lux, Matthias Edling, Peter Siemroth, 2018

机译：大电流电弧蒸发在铜箔上快速且经济高效地合成高质量石墨烯
7. TTS-by-TTS: TTS-Driven Data Augmentation for Fast and High-Quality Speech Synthesis [O] . Min-Jae Hwang, Ryuichi Yamamoto, Eunwoo Song, 2021

机译：TTS-BY-TTS：TTS驱动的数据增强用于快速和高质量的语音合成

TTS-by-TTS: TTS-Driven Data Augmentation for Fast and High-Quality Speech Synthesis

摘要

著录项

相似文献

相关主题

期刊订阅