A COMPARISON OF RECENT WAVEFORM GENERATION AND ACOUSTIC MODELING METHODS FOR NEURAL-NETWORK-BASED SPEECH SYNTHESIS

机译：基于神经网络的语音合成近期波形生成和声学建模方法的比较

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recent advances in speech synthesis suggest that limitations such as the lossy nature of the amplitude spectrum with minimum phase approximation and the over-smoothing effect in acoustic modeling can be overcome by using advanced machine learning approaches. In this paper, we build a framework in which we can fairly compare new vocoding and acoustic modeling techniques with conventional approaches by means of a large scale crowdsourced evaluation. Results on acoustic models showed that generative adversarial networks and an autoregressive (AR) model performed better than a normal recurrent network and the AR model performed best. Evaluation on vocoders by using the same AR acoustic model demonstrated that a Wavenet vocoder outperformed classical source-filter-based vocoders. Particularly, generated speech waveforms from the combination of AR acoustic model and Wavenet vocoder achieved a similar score of speech quality to vocoded speech.

机译：语音合成的最新进展表明，通过使用先进的机器学习方法，可以克服幅度谱的幅度谱的有损性质等限制，以及声学建模中的过平滑效果。在本文中，我们建立了一个框架，其中我们可以通过大规模众群评估公平地比较具有传统方法的新的声音和声学建模技术。声学模型的结果表明，生成的对抗网络和自回归（AR）模型比正常的复发网络和AR模型更好地执行。使用相同的AR声学模型对声码器的评估表明，Wavenet声码器超越了基于古典源滤波器的声码器。特别地，来自AR声学模型和WVENET VOCODER的组合的生成的语音波形实现了与声音语音相似的语音质量分数。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2018年|4454-5088p|共5页
会议地点
作者
Xin Wang; Jaime Lorenzo-Trueba; Shinji Takaki; Lauri Juvela; Junichi Yamagishi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN912-53;
关键词
speech synthesis; deep learning; Wavenet; general adversarial network; autoregressive neural network;

机译：语音综合;深度学习;Wavenet;一般对抗网络;自动增加神经网络;

相似文献

外文文献
中文文献
专利

1. Noise and acoustic modeling with waveform generator in text-to-speech and neutral speech conversion [J] . Mohammed Salah Al-Radhi, Tamas Gabor Csapo, Geza Nemeth Multimedia Tools and Applications . 2021,第2期

机译：文本与语音与中性语音转换中的波形发生器噪声和声学建模
2. Parameter Generation Methods With Rich Context Models for High-Quality and Flexible Text-To-Speech Synthesis [J] . Selected Topics in Signal Processing, IEEE Journal of . 2014,第2期

机译：具有丰富上下文模型的参数生成方法，用于高质量和灵活的文本到语音合成
3. Unit Selection Speech Synthesis Using Frame-Sized Speech Segments and Neural Network Based Acoustic Models [J] . Zhen-Hua Ling, Zhi-Ping Zhou Journal of VLSI signal processing systems for signal, image, and video technology . 2018,第7期

机译：基于帧大小的语音片段和基于神经网络的声学模型的单位选择语音合成
4. A COMPARISON OF RECENT WAVEFORM GENERATION AND ACOUSTIC MODELING METHODS FOR NEURAL-NETWORK-BASED SPEECH SYNTHESIS [C] . Xin Wang, Jaime Lorenzo-Trueba, Shinji Takaki, IEEE International Conference on Acoustics, Speech and Signal Processing . 2018

机译：基于神经网络的语音合成近期波形生成和声学建模方法的比较
5. Spectral analysis of pathological acoustic speech waveforms . [D] . Medida, Priyanka. 2009

机译：病理声学语音波形的频谱分析。
6. Comparison of Silent Navigator Waveform Generation Methods [O] . Yuji Iwadate, Atsushi Nozaki, Yoshinobu Nunokawa, 2020

机译：静默导航仪波形生成方法的比较
7. Pole-zero modeling of transient waveforms: a comparison of methods with application to acoustic signals [O] . May Gary L. 1991

机译：瞬态波形的零极点建模：方法与声信号应用的比较

A COMPARISON OF RECENT WAVEFORM GENERATION AND ACOUSTIC MODELING METHODS FOR NEURAL-NETWORK-BASED SPEECH SYNTHESIS

摘要

著录项

相似文献

相关主题

期刊订阅