Multi-speaker Emotional Acoustic Modeling for CNN-based Speech Synthesis

机译：基于CNN的语音合成的多说话人情感声学建模

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we investigate multi-speaker emotional acoustic modeling methods for convolutional neural network (CNN) based speech synthesis system. For emotion modeling, we extend to the speech synthesis system that learns a latent embedding space of emotion, derived from a desired emotional identity, and we use emotion code and mel-frequency spectrogram as an emotion identity. In order to model speaker variation in a text-to-speech (TTS) system, we use speaker representations such as trainable speaker embedding and speaker code. We have implemented speech synthesis systems combining speaker representation and emotion representation and compared them by experiments. Experimental results have demonstrated that the multi-speaker emotional speech synthesis approach using trainable speaker embedding and emotion representation from mel spectrogram achieves higher performance when compared with other approaches in terms of naturalness, speaker similarity, and emotion similarity.

机译：在本文中，我们研究了基于卷积神经网络（CNN）的语音合成系统的多说话人情感声学建模方法。对于情感建模，我们扩展到语音合成系统，该系统学习从期望的情感标识派生的潜在的情感嵌入空间，并使用情感代码和梅尔频率频谱图作为情感标识。为了对文本到语音（TTS）系统中的说话人变化进行建模，我们使用了说话人表示形式，例如可训练的说话人嵌入和说话人代码。我们已经实现了语音合成系统，该系统将说话人表征和情感表征相结合，并通过实验进行了比较。实验结果表明，在自然性，说话者相似性和情感相似性方面，与其他方法相比，使用可训练的说话者嵌入和mel频谱图的情感表示的多说话者情感语音合成方法具有更高的性能。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2019年|6950-6954|共5页
会议地点
作者
Heejin Choi; Sangjun Park; Jinuk Park; Minsoo Hahn;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
cepstral analysis; convolutional neural nets; emotion recognition; learning (artificial intelligence); signal representation; speaker recognition; speech synthesis;

机译：倒谱分析;卷积神经网络;情感识别;学习（人工智能）;信号表示;说话人识别;语音合成;

相似文献

外文文献
中文文献
专利

1. A comparative study on modeling and controlling emotional acoustic parameters in neural networks based Japanese and Spanish speech synthesis [J] . JAIME LORENZO-TRUEBA, SHINJI TAKAKI, JUNICHI YAMAGISHI 電子情報通信学会技術研究報告. 音声. Speech . 2016,第378期

机译：基于日语和西班牙语语音合成的神经网络中的情感声学参数建模和控制的比较研究
2. Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis [J] . Junichi YAMAGISHI, Koji ONISHI, Takashi MASUKO, IEICE Transactions on Information and Systems . 2005,第3期

机译：基于HMM的语音合成中说话风格和情感表达的声学建模
3. Emotional Speech Modeling in HMM-based Speech Synthesis [J] . Ryosuke TSUDUKI, Heiga ZEN, Keiichi TOKUDA, 電子情報通信学会技術研究報告. 音声. Speech . 2003,第264期

机译：基于HMM的语音合成中的情感语音建模
4. Multi-speaker Emotional Acoustic Modeling for CNN-based Speech Synthesis [C] . Heejin Choi, Sangjun Park, Jinuk Park, IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：基于CNN的语音合成多扬声器情感声学建模
5. Emotional speech: A quantitative study of vocal acoustics in emotional expression. [D] . Katz, Gary Scott. 1998

机译：情感言语：对情感表达中的声音声学的定量研究。
6. Emotional Connotations of Musical Instrument Timbre in Comparison With Emotional Speech Prosody: Evidence From Acoustics and Event-Related Potentials [O] . Xiaoluan Liu, Yi Xu, Kai Alter, -1

机译：与情感言语韵律相比乐器音色的情感内涵：来自声学和与事件相关的电位的证据
7. DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis [O] . Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari 2019

机译：基于DNN的扬声器使用主观讲话者相似性，用于语音合成中的多扬声器建模

Multi-speaker Emotional Acoustic Modeling for CNN-based Speech Synthesis

摘要

著录项

相似文献

相关主题

期刊订阅