Utterance-Level Sequential Modeling for Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit

机译：基于深层高斯过程的语音合成使用简单复发单元的话语级顺序建模

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a deep Gaussian process (DGP) model with a recurrent architecture for speech sequence modeling. DGP is a Bayesian deep model that can be trained effectively with the consideration of model complexity and is a kernel regression model that can have high expressibility. In the previous studies, it was shown that the DGP-based speech synthesis outperformed neural network-based one, in which both models used a feed-forward architecture. To improve the naturalness of synthetic speech, in this paper, we show that DGP can be applied to utterance-level modeling using recurrent architecture models. We adopt a simple recurrent unit (SRU) for the proposed model to achieve a recurrent architecture, in which we can execute fast speech parameter generation by using the high parallelization nature of SRU. The objective and subjective evaluation results show that the proposed SRU-DGP-based speech synthesis outperforms not only feed-forward DGP but also automatically tuned SRU- and long short-term memory (LSTM)-based neural networks.

机译：本文介绍了具有用于语音序列建模的反复架构的深层高斯过程（DGP）模型。 DGP是一种贝叶斯深层模型，可以通过考虑模型复杂性有效地培训，并且是一种可以具有高表现性的内核回归模型。在以前的研究中，显示基于DGP的语音合成优于基于神经网络的基于神经网络的语音合成，其中两个模型都使用前馈架构。为了提高合成语音的自然性，在本文中，我们表明DGP可以使用经常性架构模型应用于话语级模型。我们采用简单的复发单位（SRU）来实现建议的模型，以实现经常性架构，其中我们可以使用SRU的高并行化性质来执行快速语音参数生成。目标和主观评价结果表明，所提出的基于SRU-DGP的语音合成优于前馈DGP，但也会自动调谐SRU和长期内存（LSTM）的神经网络。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2020年|p6824-7443|共5页
会议地点
作者
Tomoki Koriyama; Hiroshi Saruwatari;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN912-53;
关键词
speech synthesis; deep Gaussian process; simple recurrent unit; Bayesian deep model; sequential modeling;

机译：语音合成;深度高斯过程;简单的复发单位;贝叶斯深模型;顺序建模;

相似文献

外文文献
中文文献
专利

1. Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks [J] . Morten Kolbæk, Dong Yu, Zheng-Hua Tan, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2017,第10期

机译：深度递归神经网络的话语水平置换不变训练的多说话人语音分离
2. Learning and Modeling Unit Embeddings Using Deep Neural Networks for Unit-Selection-Based Mandarin Speech Synthesis [J] . XIAO ZHOU, ZHEN-HUA LING, LI-RONG DAI ACM transactions on Asian and low-resource language information processing . 2020,第3期

机译：基于单位选择的普通话语音合成学习与建模单元嵌入
3. An introduction of Gaussian processes and deep Gaussian processes and their applications to speech processing [J] . Tomoki Koriyama Acoustical science and technology . 2020,第2期

机译：简介高斯过程和深层高斯过程及其在语音处理中的应用
4. Utterance-Level Sequential Modeling for Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit [C] . Tomoki Koriyama, Hiroshi Saruwatari IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：基于简单递归单元的基于深高斯过程的语音合成的话语级顺序建模
5. Nonparametric localized Gaussian process models for accelerated meso-scale Monte Carlo simulation-based design and control of Carbon nanotube synthesis in chemical vapor deposition process. [D] . Cheng, Changqing. 2013

机译：非参数局部高斯过程模型，用于加速基于中观蒙特卡罗模拟的设计和化学气相沉积过程中碳纳米管合成的控制。
6. Data from fitting Gaussian process models to various data sets using eight Gaussian process software packages [O] . Collin B. Erickson, Bruce E. Ankenman, Susan M. Sanchez 2018

机译：使用八个高斯过程软件包将高斯过程模型拟合到各种数据集的数据
7. Frame-level acoustic modeling based on Gaussian process regression for statistical nonparametric speech synthesis [O] . Tomoki Koriyama, Takashi Nose, Takao Kobayashi 2013

机译：基于高斯工艺回归统计非参数语音合成的帧级声学建模

Utterance-Level Sequential Modeling for Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit

摘要

著录项

相似文献

相关主题

期刊订阅