Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis

机译：通过神经网络直接建模语音波形以进行统计参数语音合成

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper proposes a novel approach for directly-modeling speech at the waveform level using a neural network. This approach uses the neural network-based statistical parametric speech synthesis framework with a specially designed output layer. As acoustic feature extraction is integrated to acoustic model training, it can overcome the limitations of conventional approaches, such as two-step (feature extraction and acoustic modeling) optimization, use of spectra rather than waveforms as targets, use of overlapping and shifting frames as unit, and fixed decision tree structure. Experimental results show that the proposed approach can directly maximize the likelihood defined at the waveform domain.

机译：本文提出了一种使用神经网络在波形水平上直接建模语音的新颖方法。这种方法使用了基于神经网络的统计参数语音合成框架，该框架具有经过特殊设计的输出层。由于声学特征提取已集成到声学模型训练中，因此它可以克服常规方法的局限性，例如两步（特征提取和声学建模）优化，使用频谱而不是波形作为目标，使用重叠和移动帧作为目标单元和固定的决策树结构。实验结果表明，所提出的方法可以直接最大化在波形域上定义的似然性。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2015年|4215-4219|共5页
会议地点
作者
Tokuday Keiichi; Zen Heiga;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Statistical parametric speech synthesis; adaptive cepstral analysis; neural network;

机译：统计参量语音合成;自适应倒频谱分析;神经网络;

相似文献

外文文献
中文文献
专利

1. Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis [J] . Xin Wang, Shinji Takaki, Junichi Yamagishi Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2020,第期

机译：神经源 - 滤波器波形模型用于统计参数语音合成
2. GlotNet—A Raw Waveform Model for the Glottal Excitation in Statistical Parametric Speech Synthesis [J] . Juvela Lauri, Bollepalli Bajibabu, Tsiaras Vassilis, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2019,第6期

机译：GlotNet-统计参数语音合成中声门激励的原始波形模型
3. GlotNet—A Raw Waveform Model for the Glottal Excitation in Statistical Parametric Speech Synthesis [J] . Juvela Lauri, Bollepalli Bajibabu, Tsiaras Vassilis, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2019,第6期

机译：GLOTNET - 一种原始波形模型，用于统计参数致辞综合作用
4. Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis [C] . K. Tokuday, H. Zen IEEE International Conference on Acoustics, Speech and Signal Processing . 2015

机译：用神经网络直接建模语音波形统计参数语音合成
5. Statistical Parametric Speech Synthesis using Deep Learning Architectures [D] . Kang, Shiyin. 2016

机译：使用深度学习架构的统计参数致辞
6. Discriminative Multi-Stream Postfilters Based on Deep Learning for Enhancing Statistical Parametric Speech Synthesis [O] . Marvin Coto-Jiménez 2021

机译：基于深度学习的判别多流破旧用于增强统计参数致辞综合
7. DIRECTLY MODELING SPEECH WAVEFORMS BY NEURAL NETWORKS FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS [O] . Keiichi Tokudayz, Heiga Zeny 2015

机译：用神经网络直接模拟语音波形进行统计参数语音合成

Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis

摘要

著录项

相似文献

相关主题

期刊订阅