首页> 外文会议>International Conference on Signal Processing and Communications >Effectiveness of Transfer Learning on Singing Voice Conversion in the Presence of Background Music
【24h】

Effectiveness of Transfer Learning on Singing Voice Conversion in the Presence of Background Music

机译:在背景音乐存在下转学对唱歌声转换的有效性

获取原文

摘要

Singing voice conversion (SVC) is a task of converting the perception of the source speaker’s identity to the target speaker without changing lyrics and rhythm. Recent approaches in traditional voice conversion involve the use of the generative models, such as Variational Autoencoders (VAE), and Generative Adversarial Networks (GANs). However, in the case of SVC, GANs are not explored much. The only system that has been proposed in the literature uses traditional GAN on the parallel data. The parallel data collection for real scenarios (with the same background music) is not feasible. Moreover, in the presence of background music, SVC is one of the most challenging tasks as it involves the source separation of vocals from the inputs, which will have some noise. Therefore, in this paper, we propose transfer learning, and fine-tuning-based Cycle consistent GAN (CycleGAN) model for non-parallel SVC, where music source separation is done using Deep Attractor Network (DANet). We designed seven different possible systems to identify the best possible combination of transfer learning and fine-tuning. Here, we use a more challenging database, MUSDB18, as our primary dataset, and we also use the NUS-48E database to pre-train CycleGAN. We perform extensive analysis via objective and subjective measures and report that with a 4.14 MOS score out of 5 for naturalness, the CycleGAN model pre-trained on NUS-48E corpus performs the best compared to the other systems described in the paper.
机译:唱歌语音转换(SVC)的任务是在不更改歌词和节奏的情况下,将对源说话者身份的感知转换为目标说话者。传统语音转换的最新方法涉及使用生成模型,例如变分自动编码器(VAE)和生成对抗网络(GAN)。但是,对于SVC,对GAN的探索不多。文献中提出的唯一系统对并行数据使用传统GAN。实际场景(具有相同的背景音乐)的并行数据收集是不可行的。此外,在存在背景音乐的情况下,SVC是最具挑战性的任务之一,因为它涉及到人声与输入的信号源分离,这会产生一些噪音。因此,在本文中,我们为非并行SVC提出了基于转移学习和基于微调的循环一致性GAN(CycleGAN)模型,其中使用深度吸引者网络(DANet)进行音乐源分离。我们设计了七种不同的系统,以识别转移学习和微调的最佳组合。在这里,我们使用更具挑战性的数据库MUSDB18作为主要数据集,并且还使用NUS-48E数据库对CycleGAN进行预训练。我们通过客观和主观的措施进行了广泛的分析,并报告说,相对于本文中描述的其他系统,在NUS-48E语料库上经过预训练的CycleGAN模型的自然评分为4.14(满分5分),表现最佳。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号