首页> 外文会议>IEEE International Conference on Multimedia and Expo >Data Augmentation for Monaural Singing Voice Separation Based on Variational Autoencoder-Generative Adversarial Network
【24h】

Data Augmentation for Monaural Singing Voice Separation Based on Variational Autoencoder-Generative Adversarial Network

机译:基于变分自编码-生成对抗网络的单声道歌声分离数据增强

获取原文

摘要

Random mixing and circularly shifting for augmenting the training set are used to improve the separation effect of deep neural network (DNN)-based monaural singing voice separation (MSVS). However, these manual methods are based on unrealistic assumptions that two sources in the mixture are independent of each other, which limits the separation effect. This paper proposes a data augmentation method based on variational autoencoder (VAE) and generative adversarial network (GAN), which is called as VAE-GAN. The VAE models the observed spectra of sources (vocal and music) separately and reconstructs new spectra from the latent space. The GAN's discriminator is introduced to measure the correlation between the latent variables of the vocal and music generated by the VAE probability encoder. This adversarial mechanism in VAE's latent space could learn the synthetic likelihood and ultimately decode high quality spectra samples, which further improves the separation effect of general MSVS networks.
机译:用于增强训练集的随机混合和圆形转移用于改善深神经网络(DNN)的单声道歌唱语音分离(MSV)的分离效果。然而,这些手动方法基于不切实际的假设,即混合物中的两个来源彼此独立,这限制了分离效果。本文提出了一种基于变分性AutoEncoder(VAE)和生成对抗网络(GaN)的数据增强方法,称为VAE-GaN。 VAE分别模拟了源(声音和音乐)的观察光谱,并从潜在空间重建了新的光谱。介绍了GaN的鉴别器来测量VAE概率编码器产生的声音和音乐的潜在变量之间的相关性。 VAE潜在空间中的这种对抗机制可以学习合成似然性,最终解码高质量的光谱样本,这进一步提高了通用MSVS网络的分离效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号