首页> 外文期刊>The international arab journal of information technology >F_0 Modeling for Isarn Speech Synthesis using Deep Neural Networks and Syllable-level Feature Representation
【24h】

F_0 Modeling for Isarn Speech Synthesis using Deep Neural Networks and Syllable-level Feature Representation

机译:使用深神经网络和音节级特征表示,ISARN语音合成的F_0建模

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

The generation of the fundamental frequency (F-0) plays an important role in speech synthesis, which directly influences the naturalness of synthetic speech. In conventional parametric speech synthesis, F-0 is predicted frame-by-frame. This method is insufficient to represent F-0 contours in larger units, especially tone contours of syllables in tonal languages that deviate as a result of long-term context dependency. This work proposes a syllable-level F-0 model that represents F-0 contours within syllables, using syllable-level F-0 parameters that comprise the sampling F-0 points and dynamic features. A Deep Neural Network (DNN) was used to represent the relationships between syllable-level contextual features and syllable-level F-0 parameters. The proposed model was examined using an Isarn speech synthesis system with both large and small training sets. For all training sets, the results of objective and subjective tests indicate that the proposed approach outperforms the baseline systems based on hidden Markov models and DNNS that predict F-0 values at the frame level.
机译:基本频率(F-0)的产生在语音合成中起重要作用,这直接影响了合成语音的自然性。在传统的参数语音合成中,F-0被逐帧预测。该方法不足以表示较大单位的F-0轮廓,特别是以长期上下文依赖性偏离的音节的音节的音调轮廓。这项工作提出了一个音节级F-0模型,它代表了音节内的F-0轮廓,使用了包括采样F-0点和动态特征的音节级F-0参数。深度神经网络(DNN)用于表示音节级上下文特征和音节级F-0参数之间的关系。通过具有大小训练集的ISARN语音合成系统检查所提出的模型。对于所有培训集,客观和主观测试的结果表明,所提出的方法超越基于隐马尔可夫模型和DNN的基线系统,该模型可以在帧级别预测F-0值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号