首页> 外文会议>International Conference on Signal Processing and Communications >Effectiveness of Transfer Learning on Singing Voice Conversion in the Presence of Background Music

【24h】

Effectiveness of Transfer Learning on Singing Voice Conversion in the Presence of Background Music

机译：在背景音乐存在下转学对唱歌声转换的有效性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Singing voice conversion (SVC) is a task of converting the perception of the source speaker’s identity to the target speaker without changing lyrics and rhythm. Recent approaches in traditional voice conversion involve the use of the generative models, such as Variational Autoencoders (VAE), and Generative Adversarial Networks (GANs). However, in the case of SVC, GANs are not explored much. The only system that has been proposed in the literature uses traditional GAN on the parallel data. The parallel data collection for real scenarios (with the same background music) is not feasible. Moreover, in the presence of background music, SVC is one of the most challenging tasks as it involves the source separation of vocals from the inputs, which will have some noise. Therefore, in this paper, we propose transfer learning, and fine-tuning-based Cycle consistent GAN (CycleGAN) model for non-parallel SVC, where music source separation is done using Deep Attractor Network (DANet). We designed seven different possible systems to identify the best possible combination of transfer learning and fine-tuning. Here, we use a more challenging database, MUSDB18, as our primary dataset, and we also use the NUS-48E database to pre-train CycleGAN. We perform extensive analysis via objective and subjective measures and report that with a 4.14 MOS score out of 5 for naturalness, the CycleGAN model pre-trained on NUS-48E corpus performs the best compared to the other systems described in the paper.

机译：唱歌语音转换（SVC）的任务是在不更改歌词和节奏的情况下，将对源说话者身份的感知转换为目标说话者。传统语音转换的最新方法涉及使用生成模型，例如变分自动编码器（VAE）和生成对抗网络（GAN）。但是，对于SVC，对GAN的探索不多。文献中提出的唯一系统对并行数据使用传统GAN。实际场景（具有相同的背景音乐）的并行数据收集是不可行的。此外，在存在背景音乐的情况下，SVC是最具挑战性的任务之一，因为它涉及到人声与输入的信号源分离，这会产生一些噪音。因此，在本文中，我们为非并行SVC提出了基于转移学习和基于微调的循环一致性GAN（CycleGAN）模型，其中使用深度吸引者网络（DANet）进行音乐源分离。我们设计了七种不同的系统，以识别转移学习和微调的最佳组合。在这里，我们使用更具挑战性的数据库MUSDB18作为主要数据集，并且还使用NUS-48E数据库对CycleGAN进行预训练。我们通过客观和主观的措施进行了广泛的分析，并报告说，相对于本文中描述的其他系统，在NUS-48E语料库上经过预训练的CycleGAN模型的自然评分为4.14（满分5分），表现最佳。

著录项

来源
《International Conference on Signal Processing and Communications 》|2020年|1-5|共5页
会议地点
作者
Divyesh G. Rajpura; Jui Shah; Maitreya Patel; Harshit Malaviya; Kirtana Phatnani; Hemant A. Patil;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Static VAr compensators; Task analysis; Databases; Gallium nitride; Generators; Rhythm;

机译：静态VAr补偿器;任务分析;数据库;氮化镓;发电机;节奏;

相似文献

外文文献
中文文献
专利

1. Contemporary Commercial Music Singing Students-Voice Quality and Vocal Function at the Beginning of Singing Training [J] . Sielska-Badurek Ewelina M., Sobol Maria, Olszowska Katarzyna, Journal of voice: official journal of the Voice Foundation . 2018 ,第6期

机译：当代商业音乐唱歌学生 - 语音质量和声乐功能在歌唱训练开始时
2. Emotion Recognition From Singing Voices Using Contemporary Commercial Music and Classical Styles [J] . Journal of voice: official journal of the Voice Foundation . 2019 ,第4期

机译：使用当代商业音乐和古典风格来唱歌的情感认可
3. VOICE2TUBA: transforming singing voice into a musical instrument [J] . Santacruz Jose L., Tardon Lorenzo J., Barbancho Isabel, Multimedia Tools and Applications . 2017 ,第7期

机译：VOICE2TUBA：将歌声转换为乐器
4. Separation of singing voice from background musical noise using modified NMF and Filtering [C] . Snehal S. Gaikwad, Pallavi P. Ingale, S. L. Nalbalwar International Conference on Electrical, Electronics, and Optimization Techniques . 2016

机译：使用改进的NMF和滤波将歌声与背景音乐噪声分离
5. Recommending Music for Language Learning: The Problem of Singing Voice Intelligibility [D] . Ibrahim, Karim M. 2018

机译：推荐音乐进行语言学习：语音清晰度的问题
6. Singing in the brain: Neural representation of music and voice as revealed by fMRI [O] . Jocelyne C. Whitehead, Jorge L. Armony 2018

机译：在大脑中唱歌：fMRI显示的音乐和语音的神经表示
7. Transfer Learning for Improving Singing-Voice Detection in Polyphonic Instrumental Music [O] . Yuanbo Hou, Frank K. Soong, Jian Luan, 2020

机译：转移学习改善多关乐器音乐中唱歌语音检测

Effectiveness of Transfer Learning on Singing Voice Conversion in the Presence of Background Music

摘要

著录项

相似文献

相关主题

期刊订阅