Unsupervised Representation Disentanglement Using Cross Domain Features and Adversarial Learning in Variational Autoencoder Based Voice Conversion

Wen-Chin Huang; Hao Luo; Hsin-Te Hwang; Chen-Chou Lo; Yu-Huai Peng; Yu Tsao; Hsin-Min Wang

首页> 外文期刊>IEEE Transactions on Emerging Topics in Computational Intelligence >Unsupervised Representation Disentanglement Using Cross Domain Features and Adversarial Learning in Variational Autoencoder Based Voice Conversion

【24h】

Unsupervised Representation Disentanglement Using Cross Domain Features and Adversarial Learning in Variational Autoencoder Based Voice Conversion

机译：基于变化的自动化器语音转换中的跨域特征和对逆势学习的无监督的表示解剖

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

An effective approach for voice conversion (VC) is to disentangle linguistic content from other components in the speech signal. The effectiveness of variational autoencoder (VAE) based VC (VAE-VC), for instance, strongly relies on this principle. In our prior work, we proposed a cross-domain VAE-VC (CDVAE-VC) framework, which utilized acoustic features of different properties, to improve the performance of VAE-VC. We believed that the success came from more disentangled latent representations. In this article, we extend the CDVAE-VC framework by incorporating the concept of adversarial learning, in order to further increase the degree of disentanglement, thereby improving the quality and similarity of converted speech. More specifically, we first investigate the effectiveness of incorporating the generative adversarial networks (GANs) with CDVAE-VC. Then, we consider the concept of domain adversarial training and add an explicit constraint to the latent representation, realized by a speaker classifier, to explicitly eliminate the speaker information that resides in the latent code. Experimental results confirm that the degree of disentanglement of the learned latent representation can be enhanced by both GANs and the speaker classifier. Meanwhile, subjective evaluation results in terms of quality and similarity scores demonstrate the effectiveness of our proposed methods.

机译：用于语音转换（VC）的有效方法是从语音信号中的其他组件解开语言内容。例如，基于VC（VAE-VC）的变形自动化器（VAE）的有效性强烈依赖于该原理。在我们之前的工作中，我们提出了一种跨域VAE-VC（CDVae-VC）框架，其利用不同性质的声学特征来提高VAE-VC的性能。我们认为成功来自更加脱俗的潜在席位。在本文中，我们通过纳入对抗性学习的概念来扩展CDVAE-VC框架，以进一步提高解剖程度，从而提高转换语音的质量和相似性。更具体地，我们首先探讨将生成的对抗性网络（GAN）与CDVae-VC掺入的有效性。然后，我们考虑域对抗训练的概念，并为扬声器分类器实现的潜在表示添加了明确的约束，以明确地消除驻留在潜在代码中的扬声器信息。实验结果证实，由GAN和扬声器分类器可以增强所学习潜在代表的解剖程度。同时，主观评估结果在质量和相似度分数方面表现出我们所提出的方法的有效性。

著录项

来源
《IEEE Transactions on Emerging Topics in Computational Intelligence》 |2020年第4期|468-479|共12页
作者
Wen-Chin Huang; Hao Luo; Hsin-Te Hwang; Chen-Chou Lo; Yu-Huai Peng; Yu Tsao; Hsin-Min Wang;
展开▼
作者单位

Institute of Information Science Academia Sinica Taipei Taiwan;

Institute of Information Science Academia Sinica Taipei Taiwan;

Institute of Information Science Academia Sinica Taipei Taiwan;

Institute of Information Science Academia Sinica Taipei Taiwan;

Institute of Information Science Academia Sinica Taipei Taiwan;

Research Center of Information Technology Institute of Information Science Academia Sinica Taipei Taiwan;

Institute of Information Science Academia Sinica Taipei Taiwan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Training; Gallium nitride; Decoding; Measurement; Speech processing; Vocoders; Task analysis;

机译：培训;氮化镓;解码;测量;语音处理;声码器;任务分析;

相似文献

外文文献
中文文献
专利

1. Deep sparse autoencoder prediction model based on adversarial learning for cross-domain recommendations [J] . Li Yakun, Ren Jiadong, Liu Jiaomin, Knowledge-Based Systems . 2021,第MAYa23期

机译：基于对抗域推荐对抗性学习的深度稀疏自动化器预测模型
2. Unsupervised feature learning for online voltage stability evaluation and monitoring based on variational autoencoder [J] . Yang Haosen, Qiu Robert C., Shi Xin, Electric power systems research . 2020,第May期

机译：基于变化自动化器的在线电压稳定性评估和监控无监督特征学习
3. Cross-domain representation learning by domain-migration generative adversarial network for sketch based image retrieval [J] . Bai Cong, Chen Jian, Ma Qing, Journal of visual communication & image representation . 2020,第Auga期

机译：基于草图的图像检索的域 - 迁移生成的对抗网络跨域表示学习
4. Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders [C] . Yin-Jyun Luo, Chin-Cheng Hsu, Kat Agres, IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：使用变分自动编码器以歌手和人声技术的解缠表示形式进行演唱语音转换
5. Unsupervised Representation Learning With Autoencoders [D] . Makhzani, Alireza. 2018

机译：自动编码器的无监督表示学习
6. A representation learning model based on variational inference and graph autoencoder for predicting lncRNA-disease associations [O] . Zhuangwei Shi, Han Zhang, Chen Jin, 2021

机译：基于变分性推断和图形自身额相预测LNCRNA疾病关联的表示学习模型
7. Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders [O] . Yin-Jyun Luo, Chin-Cheng Hsu, Kat Agres, 2020

机译：使用变分自动化器与歌唱歌手和声乐技术的解剖唱片转换

Unsupervised Representation Disentanglement Using Cross Domain Features and Adversarial Learning in Variational Autoencoder Based Voice Conversion

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅