首页> 外文会议>International Symposium on Chinese Spoken Language Processing >Exploring Cross-lingual Singing Voice Synthesis Using Speech Data
【24h】

Exploring Cross-lingual Singing Voice Synthesis Using Speech Data

机译:使用语音数据探索交叉语言歌唱语音合成

获取原文

摘要

State-of-the-art singing voice synthesis (SVS) models can generate natural singing voice of a target speaker, given his/her speaking/singing data in the same language. However, there may be challenging conditions where only speech data in a non-target language of the target speaker is available. In this paper, we present a cross-lingual SVS system that can synthesize an English speaker’s singing voice in Mandarin from musical scores with only her speech data in English. The pro-posed cross-lingual SVS system contains four parts: a BLSTM based duration model, a pitch model, a cross-lingual acoustic model and a neural vocoder. The acoustic model employs encoder-decoder architecture conditioned on pitch, phoneme duration, speaker information and language information. An adversarially-trained speaker classifier is employed to discourage the text encodings from capturing speaker information. Objective evaluation and subjective listening tests demonstrate that the proposed cross-lingual SVS system can generate singing voice with decent naturalness and fair speaker similarity. We also find that adding singing data or multi-speaker monolingual speech data further improves generalization on pronunciation and pitch accuracy.
机译:最先进的歌唱语音合成(SVS)模型可以生成目标扬声器的自然唱歌语音,鉴于他/她以相同语言的讲话数据。然而,可能存在具有挑战性的条件,其中仅提供目标扬声器的非目标语言中的语音数据。在本文中,我们介绍了一个跨语明的SVS系统,可以用乐谱综合英语扬声器的歌唱声音,只有她的语音数据用英语。 Pro构成的交叉语言SVS系统包含四个部分:基于BLSTM的持续时间模型,音调模型,交叉语言模型和神经探测器。声学模型采用编码器 - 解码器架构,调节音高,音素持续时间,扬声器信息和语言信息。采用过训练训练的扬声器分类器来阻止捕获扬声器信息的文本编码。客观评估和主观听力测试表明,所提出的交叉语言SVS系统可以使用体面的自然和公平的扬声器相似性产生歌声。我们还发现,添加歌唱数据或多扬声器单语语音数据进一步提高了发音和俯仰精度的泛化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号