Exploring Cross-lingual Singing Voice Synthesis Using Speech Data

机译：使用语音数据探索交叉语言歌唱语音合成

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

State-of-the-art singing voice synthesis (SVS) models can generate natural singing voice of a target speaker, given his/her speaking/singing data in the same language. However, there may be challenging conditions where only speech data in a non-target language of the target speaker is available. In this paper, we present a cross-lingual SVS system that can synthesize an English speaker’s singing voice in Mandarin from musical scores with only her speech data in English. The pro-posed cross-lingual SVS system contains four parts: a BLSTM based duration model, a pitch model, a cross-lingual acoustic model and a neural vocoder. The acoustic model employs encoder-decoder architecture conditioned on pitch, phoneme duration, speaker information and language information. An adversarially-trained speaker classifier is employed to discourage the text encodings from capturing speaker information. Objective evaluation and subjective listening tests demonstrate that the proposed cross-lingual SVS system can generate singing voice with decent naturalness and fair speaker similarity. We also find that adding singing data or multi-speaker monolingual speech data further improves generalization on pronunciation and pitch accuracy.

机译：最先进的歌唱语音合成（SVS）模型可以生成目标扬声器的自然唱歌语音，鉴于他/她以相同语言的讲话数据。然而，可能存在具有挑战性的条件，其中仅提供目标扬声器的非目标语言中的语音数据。在本文中，我们介绍了一个跨语明的SVS系统，可以用乐谱综合英语扬声器的歌唱声音，只有她的语音数据用英语。 Pro构成的交叉语言SVS系统包含四个部分：基于BLSTM的持续时间模型，音调模型，交叉语言模型和神经探测器。声学模型采用编码器 - 解码器架构，调节音高，音素持续时间，扬声器信息和语言信息。采用过训练训练的扬声器分类器来阻止捕获扬声器信息的文本编码。客观评估和主观听力测试表明，所提出的交叉语言SVS系统可以使用体面的自然和公平的扬声器相似性产生歌声。我们还发现，添加歌唱数据或多扬声器单语语音数据进一步提高了发音和俯仰精度的泛化。

著录项

来源
《International Symposium on Chinese Spoken Language Processing》|2021年|1-5|共5页
会议地点
作者
Yuewen Cao; Songxiang Liu; Shiyin Kang; Na Hu; Peng Liu; Xunying Liu; Dan Su; Dong Yu; Helen Meng;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Vocoders; Data models; Encoding; Timbre;

机译：训练;声码器;数据模型;编码;摩尔布勒;

相似文献

外文文献
中文文献
专利

1. Singing voice synthesis from read speech using STRAIGHT [J] . Masato Akagi, Ichiro Shimizu 電子情報通信学会技術研究報告. 音声. Speech . 2003,第154期

机译：使用STRAIGHT从阅读的语音中唱歌进行语音合成
2. Singing voice synthesis from read speech using STRAIGHT [J] . Masato Akagi, Ichiro Shimizu 電子情報通信学会技術研究報告. 音声. Speech . 2003,第154期

机译：使用直线读取读语音的语音合成
3. Singing in groups for Parkinson's disease (SING-PD): A pilot study of group singing therapy for PD-related voice/speech disorders [J] . Shih,L.C., Piel,J., Warren,A., Parkinsonism & related disorders . 2012,第5期

机译：帕金森氏病团体唱歌（SING-PD）：针对PD相关语音/语音障碍的团体唱歌疗法的一项初步研究
4. SPEECH-TO-SINGING SYNTHESIS: CONVERTING SPEAKING VOICES TO SINGING VOICES BY CONTROLLING ACOUSTIC FEATURES UNIQUE TO SINGING VOICES [C] . Takeshi Saitou, Masataka Goto, Masashi Unoki, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics . 2007

机译：演讲歌唱综合：通过控制独特的声音独特的声学功能转换说话的声音来唱歌
5. Acoustic models for the analysis and synthesis of the singing voice. [D] . Lee, Matthew E. 2005

机译：用于分析和合成歌声的声学模型。
6. Automatic speech and singing classification in ambulatory recordings for normal and disordered voices [O] . Andrew J. Ortiz, Laura E. Toles, Katherine L. Marks, -1

机译：自动录制语音和唱歌分类以记录正常和无序的声音
7. SPEECH-TO-SINGING SYNTHESIS: CONVERTING SPEAKING VOICES TO SINGING VOICES BY CONTROLLING ACOUSTIC FEATURES UNIQUE TO SINGING VOICES [O] . Takeshi Saitou, Masataka Goto 2009

机译：语音合成：通过控制独特的语音特征将语音转换为语音

Exploring Cross-lingual Singing Voice Synthesis Using Speech Data

摘要

著录项

相似文献

相关主题

期刊订阅