首页> 外文会议>International Conference on Speech and Computer >Ensemble Deep Neural Network Based Waveform-Driven Stress Model for Speech Synthesis
【24h】

Ensemble Deep Neural Network Based Waveform-Driven Stress Model for Speech Synthesis

机译:基于集合的语音合成基于BigaIn神经网络的波形驱动应力模型

获取原文

摘要

Stress annotations in the training corpus of speech synthesis systems are usually obtained by applying language rules to the transcripts. However, the actual stress patterns seen in the waveform are not guaranteed to be canonical, they can deviate from locations defined by language rules. This is driven mostly by speaker dependent factors. Therefore, stress models based on these corpora can be far from perfect. This paper proposes a waveform based stress annotation technique. According to the stress classes, four feedforward deep neural networks (DNNs) were trained to model fundamental frequency (F0) of speech. During synthesis, stress labels are generated from the textual input and an ensemble of the four DNNs predict the F0 trajectories. Objective and subjective evaluation was carried out. The results show that the proposed method surpasses the quality of vanilla DNN-based F0 models.
机译:语音合成系统训练语料库中的应力注释通常通过将语言规则应用于成绩单而获得。然而,波形中看到的实际应力模式不保证是规范的,它们可以偏离由语言规则定义的位置。这主要是由扬声器依赖因素驱动。因此,基于这些语料库的压力模型可能远非完美。本文提出了一种基于波形的应力注释技术。根据压力等级,训练了四个前馈深神经网络(DNN)以模拟语音的基本频率(F0)。在合成期间,从文本输入产生应力标签,并且四个DNN的集合预测F0轨迹。目的和主观评估进行。结果表明,该方法超越了基于Vanilla DNN的F0型号的质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号