首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >CYBORG SPEECH: DEEP MULTILINGUAL SPEECH SYNTHESIS FOR GENERATING SEGMENTAL FOREIGN ACCENT WITH NATURAL PROSODY
【24h】

CYBORG SPEECH: DEEP MULTILINGUAL SPEECH SYNTHESIS FOR GENERATING SEGMENTAL FOREIGN ACCENT WITH NATURAL PROSODY

机译:CYBORG演讲:深层多语言语音合成,用于生成与自然韵律的节段外雅

获取原文

摘要

We describe a new application of deep-learning-based speech synthesis, namely multilingual speech synthesis for generating controllable foreign accent. Specifically, we train a DBLSTM-based acoustic model on non-accented multilingual speech recordings from a speaker native in several languages. By copying durations and pitch contours from a pre-recorded utterance of the desired prompt, natural prosody is achieved. We call this paradigm "cyborg speech" as it combines human and machine speech parameters. Segmentally accented speech is produced by interpolating specific quin-phone linguistic features towards phones from the other language that represent non-native mispronunciations. Experiments on synthetic American-English-accented Japanese speech show that subjective synthesis quality matches monolingual synthesis, that natural pitch is maintained, and that naturalistic phone substitutions generate output that is perceived as having an American foreign accent, even though only non-accented training data was used.
机译:我们描述了基于深度学习的语音合成的新应用,即多语言语音合成,用于产生可控的外胶。具体而言,我们从几种语言中的扬声器中培训基于DBLSTM的声学模型。通过从预先记录的所需提示的透明发言中复制持续时间和间距轮廓,实现了自然韵律。我们称之为PARADIGM“Cyborg演讲”,因为它结合了人类和机器语音参数。通过将特定的inIn手机语言特征与代表非本土错误发布的其他语言的手机插入特定的近单手机语言特征来制作分段突出的演讲。综合美国英语的日语言论表明,主观合成质量与单机综合相匹配,维持自然间距,并且自然的电话替换产生被认为具有美国外国口音的产出,即使只有非重视培训数据被使用了。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号