首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Improving Trajectory Modelling for DNN-Based Speech Synthesis by Using Stacked Bottleneck Features and Minimum Generation Error Training
【24h】

Improving Trajectory Modelling for DNN-Based Speech Synthesis by Using Stacked Bottleneck Features and Minimum Generation Error Training

机译:通过使用堆叠的瓶颈特征和最小生成误差训练来改进基于DNN的语音合成的轨迹模型

获取原文
获取原文并翻译 | 示例

摘要

We propose two novel techniques— and —to improve the performance of deep neural network (DNN)-based speech synthesis. The techniques address the related issues of and , within current typical DNN-based synthesis frameworks. Stacking bottleneck features, which are an acoustically informed linguistic representation, provides an efficient way to include more detailed linguistic context at the input. The MGE training criterion minimises overall output trajectory error across an utterance, rather than minimising the error per frame independently, and thus takes into account the interaction between static and dynamic features. The two techniques can be easily combined to further improve performance. We present both objective and subjective results that demonstrate the effectiveness of the proposed techniques. The subjective results show that combining the two techniques leads to significantly more natural synthetic speech than from conventional DNN or long short-term memory recurrent neural network systems.
机译:我们提出了两种新颖的技术-和-来提高基于深度神经网络(DNN)的语音合成的性能。该技术解决了当前典型的基于DNN的综合框架中和相关的问题。堆叠瓶颈功能是一种可听取信息的语言表示形式,它提供了一种在输入中包含更详细的语言上下文的有效方法。 MGE训练准则将整个发声中的总输出轨迹误差降至最低,而不是独立地将每帧误差降至最低,因此考虑了静态和动态特征之间的相互作用。这两种技术可以轻松组合以进一步提高性能。我们提出了客观和主观的结果,证明了所提出的技术的有效性。主观结果表明,与传统DNN或长短期记忆递归神经网络系统相比,两种技术的结合可产生更多的自然合成语音。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号