首页> 外文会议>Spoken Language Technology Workshop >Vaw-Gan For Disentanglement And Recomposition Of Emotional Elements In Speech
【24h】

Vaw-Gan For Disentanglement And Recomposition Of Emotional Elements In Speech

机译:VAW-GAN用于解剖和致辞中的情绪要素

获取原文

摘要

Emotional voice conversion (EVC) aims to convert the emotion of speech from one state to another while preserving the linguistic content and speaker identity. In this paper, we study the disentanglement and recomposition of emotional elements in speech through variational autoencoding Wasserstein generative adversarial network (VAW-GAN). We propose a speaker-dependent EVC framework based on VAW-GAN, that includes two VAW-GAN pipelines, one for spectrum conversion, and another for prosody conversion. We train a spectral encoder that disentangles emotion and prosody (F0) information from spectral features; we also train a prosodic encoder that disentangles emotion modulation of prosody (affective prosody) from linguistic prosody. At run-time, the decoder of spectral VAW-GAN is conditioned on the output of prosodic VAW-GAN. The vocoder takes the converted spectral and prosodic features to generate the target emotional speech. Experiments validate the effectiveness of our proposed method in both objective and subjective evaluations.
机译:情绪转换(EVC)旨在将语音的情绪从一个状态转换为另一个状态,同时保留语言内容和扬声器身份。在本文中,我们通过变分自身分析Wassersein生成对抗网络(VAW-GaN)来研究语音中情绪要素的解剖和恢复。我们提出了一种基于VAW-GaN的扬声器依赖的EVC框架,包括两个VAW-GAN管道,一个用于频谱转换,另一个用于韵律转换。我们培训光谱编码器,可从光谱特征中解开情感和韵律(F0)信息;我们还培养了一个博物馆编码器,解开来自语言韵律的韵律(情感韵律)的情感调制。在运行时,频谱Vaw-GaN的解码器在韵律Vaw-GaN的输出上被调节。声码器采用转换的光谱和韵律特征来产生目标情绪语音。实验验证了我们在客观和主观评估中提出的方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号