首页> 外文期刊>IEEE Transactions on Emerging Topics in Computational Intelligence >Unsupervised Representation Disentanglement Using Cross Domain Features and Adversarial Learning in Variational Autoencoder Based Voice Conversion
【24h】

Unsupervised Representation Disentanglement Using Cross Domain Features and Adversarial Learning in Variational Autoencoder Based Voice Conversion

机译:基于变化的自动化器语音转换中的跨域特征和对逆势学习的无监督的表示解剖

获取原文
获取原文并翻译 | 示例
       

摘要

An effective approach for voice conversion (VC) is to disentangle linguistic content from other components in the speech signal. The effectiveness of variational autoencoder (VAE) based VC (VAE-VC), for instance, strongly relies on this principle. In our prior work, we proposed a cross-domain VAE-VC (CDVAE-VC) framework, which utilized acoustic features of different properties, to improve the performance of VAE-VC. We believed that the success came from more disentangled latent representations. In this article, we extend the CDVAE-VC framework by incorporating the concept of adversarial learning, in order to further increase the degree of disentanglement, thereby improving the quality and similarity of converted speech. More specifically, we first investigate the effectiveness of incorporating the generative adversarial networks (GANs) with CDVAE-VC. Then, we consider the concept of domain adversarial training and add an explicit constraint to the latent representation, realized by a speaker classifier, to explicitly eliminate the speaker information that resides in the latent code. Experimental results confirm that the degree of disentanglement of the learned latent representation can be enhanced by both GANs and the speaker classifier. Meanwhile, subjective evaluation results in terms of quality and similarity scores demonstrate the effectiveness of our proposed methods.
机译:用于语音转换(VC)的有效方法是从语音信号中的其他组件解开语言内容。例如,基于VC(VAE-VC)的变形自动化器(VAE)的有效性强烈依赖于该原理。在我们之前的工作中,我们提出了一种跨域VAE-VC(CDVae-VC)框架,其利用不同性质的声学特征来提高VAE-VC的性能。我们认为成功来自更加脱俗的潜在席位。在本文中,我们通过纳入对抗性学习的概念来扩展CDVAE-VC框架,以进一步提高解剖程度,从而提高转换语音的质量和相似性。更具体地,我们首先探讨将生成的对抗性网络(GAN)与CDVae-VC掺入的有效性。然后,我们考虑域对抗训练的概念,并为扬声器分类器实现的潜在表示添加了明确的约束,以明确地消除驻留在潜在代码中的扬声器信息。实验结果证实,由GAN和扬声器分类器可以增强所学习潜在代表的解剖程度。同时,主观评估结果在质量和相似度分数方面表现出我们所提出的方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号