首页> 外国专利> SYSTEMS AND METHODS FOR MULTI-SPEAKER NEURAL TEXT-TO-SPEECH

SYSTEMS AND METHODS FOR MULTI-SPEAKER NEURAL TEXT-TO-SPEECH

机译:用于多说话者神经文本语音转换的系统和方法

摘要

Described herein are systems and methods for augmenting neural speech synthesis networks with low-dimensional trainable speaker embeddings in order to generate speech from different voices from a single model. As a starting point for multi-speaker experiments, improved single-speaker model embodiments, which may be referred to generally as Deep Voice 2 embodiments, were developed, as well as a post-processing neural vocoder for Tacotron (a neural character-to-spectrogram model). New techniques for multi-speaker speech synthesis were performed for both Deep Voice 2 and Tacotron embodiments on two multi-speaker TTS datasets—showing that neural text-to-speech systems can learn hundreds of unique voices from twenty-five minutes of audio per speaker.
机译:本文描述了用于利用低维可训练说话者嵌入来增强神经语音合成网络以便从单个模型从不同语音生成语音的系统和方法。作为多扬声器实验的起点,开发了改进的单扬声器模型实施例(通常称为Deep Voice 2实施例),以及Tacotron的后处理神经声码器(从频谱图模型)。在两个多扬声器TTS数据集上针对Deep Voice 2和Tacotron实施例执行了多扬声器语音合成的新技术,这表明神经文本语音转换系统可以从每个扬声器25分钟的音频中学习数百种独特的语音。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号