首页> 外文OA文献 >Using Text and Acoustic Features in Predicting Glottal Excitation Waveforms for Parametric Speech Synthesis with Recurrent Neural Networks
【2h】

Using Text and Acoustic Features in Predicting Glottal Excitation Waveforms for Parametric Speech Synthesis with Recurrent Neural Networks

机译:利用文本和声学特征预测声门激励波形用于参数语音合成的递归神经网络

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This work studies the use of deep learning methods to directly model glottal excitation waveforms from context dependent text features in a text-to-speech synthesis system. Glottal vocoding is integrated into a deep neural network-based text-to-speech framework where text and acoustic features can be flexibly used as both network inputs or outputs. Long short-term memory recurrent neural networks are utilised in two stages: first, in mapping text features to acoustic features and second, in predicting glottal waveforms from the text and/or acoustic features. Results show that using the text features directly yields similar quality to the prediction of the excitation from acoustic features, both outperforming a baseline system based on using a fixed glottal pulse for excitation generation.
机译:这项工作研究了深度学习方法的使用,以在文本到语音合成系统中根据上下文相关的文本特征直接为声门激励波形建模。声门语音编码被集成到基于深度神经网络的文本到语音框架中,其中文本和声音特征可以灵活地用作网络输入或输出。长短期记忆递归神经网络分两个阶段使用:第一,将文本特征映射到声学特征,第二,从文本和/或声学特征预测声门波形。结果表明,使用文本特征可直接产生与根据声学特征进行激励的预测相似的质量,两者均优于基于使用固定声门脉冲进行激励产生的基线系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号