Natural Text-to-Speech Synthesis by Conditioning Spectrogram Predictions from Transformer Network on WaveGlow Vocoder

机译：通过在Waveglow Vocoder上的变压器网络调节谱图预测的自然文本致辞综合

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text to Speech (TTS) is a form of speech synthesis where the text is converted into a spoken human-like voice output. The state of the art methods for TTS employs a neural network based approach. This work aims to look at some of the issues and limitations present in the current works, specifically Tacotron-2, and attempts to further improve its performance by modifying its architecture. The modified model uses Transformer network as a Spectrogram Prediction Network (SPN) and WaveGlow as an Audio Generation Network (AGN). For the modified model, performance improvements are seen in terms of the speech output generated for corresponding texts, the inference time taken for audio generation, and a Mean Opinion Score (MOS) of 4.10 (out of 5) is obtained.

机译：语音（TTS）的文本是一种语音合成的形式，其中文本被转换为口语的人类语音输出。 TTS的技术方法采用基于神经网络的方法。这项工作旨在了解当前工作中存在的一些问题和限制，特别是Tacotron-2，并试图通过修改其架构来进一步提高其性能。修改的模型使用变压器网络作为频谱图预测网络（SPN）和波导作为音频生成网络（AGN）。对于修改的模型，就对应文本产生的语音输出而言，可以获得性能改进，获得用于音频生成的推理时间和4.10的平均意见分数（MOS）（5）。

著录项

来源
《IEEE International Conference on Soft Computing and Machine Intelligence》|2020年|255-259|共5页
会议地点
作者
G Sanjay; K C Sooraj; Deepak Mishra;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Predictive models; Spectrogram; Speech synthesis; Decoding; Convolution; Computer architecture; Mathematical model;

机译：预测模型;谱图;语音合成;解码;卷积;计算机架构;数学模型;

相似文献

外文文献
中文文献
专利

1. Vocoder-free text-to-speech synthesis incorporating generative adversarial networks using low-/multi-frequency STFT amplitude spectra [J] . Saito Yuki, Takamichi Shinnosuke, Saruwatari Hiroshi Computer speech and language . 2019,第NOVa期

机译：使用低/多频率STFT振幅谱的无声码合成语音合成网络
2. Vocoder-free text-to-speech synthesis incorporating generative adversarial networks using low-/multi-frequency STFT amplitude spectra [J] . Saito Yuki, Takamichi Shinnosuke, Saruwatari Hiroshi Computer speech and language . 2019,第Nova期

机译：无声码的文本到语音合成，使用低/多频谱幅度谱结合生成的对抗网络
3. Prosody modeling for syllable based text-to-speech synthesis using feedforward neural networks [J] . Reddy V. Ramu, Rao K. Sreenivasa Neurocomputing . 2016,第JANa1期

机译：使用前馈神经网络进行基于音节的语音合成的韵律建模
4. NATURAL TTS SYNTHESIS BY CONDITIONING WAVENET ON MEL SPECTROGRAM PREDICTIONS [C] . Jonathan Shen, Ruoming Pang, Ron J. Weiss, IEEE International Conference on Acoustics, Speech and Signal Processing . 2018

机译：天然TTS通过调节Wavenet对Mel谱图预测的合成
5. Improving high quality concatenative text-to-speech synthesis using the circular linear prediction model. [D] . Shukla, Sunil Ravindra. 2007

机译：使用圆形线性预测模型改善高质量的串联文本到语音合成。
6. A Modified Aging Kinetics Model for Aging Condition Prediction of Transformer Polymer Insulation by Employing the Frequency Domain Spectroscopy [O] . Jiefeng Liu, Xianhao Fan, Yiyi Zhang, 2019

机译：利用频域光谱法预测变压器聚合物绝缘老化条件的改进老化动力学模型
7. Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions [O] . Shen, Jonathan, Pang, Ruoming, Weiss, Ron J., 2017

机译：利用WaveNet对mel谱图进行自然TTs合成预测

Natural Text-to-Speech Synthesis by Conditioning Spectrogram Predictions from Transformer Network on WaveGlow Vocoder

摘要

著录项

相似文献

相关主题

期刊订阅