首页> 外国专利> SYNTHESIS OF SPEECH FROM TEXT IN A VOICE OF A TARGET SPEAKER USING NEURAL NETWORKS

SYNTHESIS OF SPEECH FROM TEXT IN A VOICE OF A TARGET SPEAKER USING NEURAL NETWORKS

机译:使用神经网络从目标说话者的语音中合成语音

摘要

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.
机译:用于语音合成的方法,系统和装置,包括在计算机存储介质上编码的计算机程序。所述方法,系统和装置包括以下动作:获得目标说话者的语音的音频表示;获得要在目标说话者的语音中为其合成语音的输入文本;通过将音频表示提供给扬声器来生成说话者矢量。扬声器编码器引擎,经过训练可以将扬声器彼此区分开,通过将输入文本和扬声器矢量提供给使用参考语音进行训练的声谱图生成引擎,可以生成目标扬声器语音中说出的输入文本的音频表示扬声器以生成音频表示,并提供目标扬声器的语音中说出的输入文本的音频表示以进行输出。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号