Singing Voice Conversion with Non-parallel Data

机译：非并行数据的歌声转换

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Singing voice conversion is a task to convert a song sang by a source singer to the voice of a target singer. In this paper, we propose using a parallel data free, many-to-one voice conversion technique on singing voices. A phonetic posterior feature is first generated by decoding singing voices through a robust Automatic Speech Recognition Engine (ASR). Then, a trained Recurrent Neural Network (RNN) with a Deep Bidirectional Long Short Term Memory (DBLSTM) structure is used to model the mapping from person-independent content to the acoustic features of the target person. F0 and aperiodic are obtained through the original singing voice, and used with acoustic features to reconstruct the target singing voice through a vocoder. In the obtained singing voice, the targeted and sourced singers sound similar. To our knowledge, this is the first study that uses non parallel data to train a singing voice conversion system. Subjective evaluations demonstrate that the proposed method effectively converts singing voices.

机译：唱歌语音转换是将源歌手演唱的歌曲转换为目标歌手的语音的任务。在本文中，我们建议对歌声使用无并行数据，多对一语音转换技术。首先通过强大的自动语音识别引擎（ASR）对歌声进行解码来生成语音后部特征。然后，使用具有深度双向长期短期记忆（DBLSTM）结构的经过训练的递归神经网络（RNN）来建模从独立于人的内容到目标人的声学特征的映射。 F0和非周期性声音是通过原始歌声获得的，并与声学特征配合使用，以通过声码器重建目标歌声。在获得的歌声中，目标和源歌手听起来相似。据我们所知，这是第一项使用非并行数据来训练歌声转换系统的研究。主观评估表明，所提出的方法可以有效地转换歌声。

著录项

来源
《Conference on Multimedia Information Processing and Retrieval》|2019年|292-296|共5页
会议地点
作者
Xin Chen; Wei Chu; Jinxi Guo; Ning Xu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Acoustics; Phonetics; Training; Feature extraction; Recurrent neural networks; Speech recognition; Hidden Markov models;

机译：声学;语音;训练;特征提取;递归神经网络;语音识别;隐马尔可夫模型;

相似文献

外文文献
中文文献
专利

1. Improvements of Voice Timbre Control Based on Perceived Age in Singing Voice Conversion [J] . Kazuhiro KOBAYASHI, Tomoki TODA, Tomoyasu NAKANO, IEICE transactions on information and systems . 2016,第11期

机译：语音转换中基于感知年龄的语音音色控制的改进
2. Voice Timbre Control Based on Perceived Age in Singing Voice Conversion [J] . Kazuhiro KOBAYASHI, Tomoki TODA, Hironori DOI, IEICE transactions on information and systems . 2014,第6期

机译：语音转换中基于感知年龄的语音音色控制
3. Speech-to-Singing Voice Conversion: The Challenges and Strategies for Improving Vocal Conversion Processes [J] . Karthika Vijayan, Haizhou Li, Tomoki Toda IEEE Signal Processing Magazine . 2019,第1期

机译：语音到唱歌的语音转换：改进声码转换过程的挑战和策略
4. VAW-GAN for Singing Voice Conversion with Non-parallel Training Data [C] . Junchen Lu, Kun Zhou, Berrak Sisman, Asia-Pacific Signal and Information Processing Association Annual Summit and Conference . 2020

机译：VAW-GAN用非平行培训数据唱歌语音转换
5. The effect of attentional focus on singing voice quality: Towards the interdisciplinary experimental investigation of singing pedagogy. [D] . Mentzel, Michael. 2016

机译：注意焦点对唱歌音质的影响：走向唱歌教学法的跨学科实验研究。
6. Aerosol emission of adolescents voices during speaking singing and shouting [O] . Dirk Mürbe, Martin Kriegel, Julia Lange, 2021

机译：青少年在演讲期间发出青少年的发射唱歌和喊叫
7. Transfer Learning From Speech Synthesis to Voice Conversion With Non-Parallel Training Data [O] . Mingyang Zhang, Yi Zhou, Li Zhao, 2021

机译：将语音合成从语音合成转移到语音转换与非平行培训数据

Singing Voice Conversion with Non-parallel Data

摘要

著录项

相似文献

相关主题

期刊订阅