首页> 外文期刊>Neurocomputing >Prosody modeling for syllable based text-to-speech synthesis using feedforward neural networks
【24h】

Prosody modeling for syllable based text-to-speech synthesis using feedforward neural networks

机译:使用前馈神经网络进行基于音节的语音合成的韵律建模

获取原文
获取原文并翻译 | 示例
           

摘要

Prosody plays an important role in improving the quality of text-to-speech synthesis (TTS) system. In this paper, features related to the linguistic and the production constraints are proposed for modeling the prosodic parameters such as duration, intonation and intensities of the syllables. The linguistic constraints are represented by positional, contextual and phonological features, and the production constraints are represented by articulatory features. Neural network models are explored to capture the implicit duration, F-0 and intensity knowledge using above mentioned features. The prediction performance of the proposed neural network models is evaluated using objective measures such as average prediction error (mu), standard deviation (sigma) and linear correlation coefficient (gamma(X,Y)). The prediction accuracy of the proposed neural network models is compared with other state-of-the-art prosody models used in TTS systems. The prediction accuracy of the proposed prosody models is also verified by conducting listening tests, after integrating the proposed prosody models to the baseline TTS system. (C) 2015 Elsevier B.V. All rights reserved.
机译:韵律在提高文本语音合成(TTS)系统的质量方面起着重要作用。本文提出了与语言和生产约束有关的特征,用于对音节的持续时间,语调和强度等韵律参数进行建模。语言约束由位置,上下文和语音特征表示,生产约束由发音特征表示。探索了神经网络模型以使用上述功能捕获隐式持续时间,F-0和强度知识。使用诸如平均预测误差(mu),标准偏差(sigma)和线性相关系数(gamma(X,Y))之类的客观指标评估所提出的神经网络模型的预测性能。所提出的神经网络模型的预测精度与TTS系统中使用的其他最新韵律模型进行了比较。在将提议的韵律模型集成到基线TTS系统之后,还可以通过进行听力测试来验证提议的韵律模型的预测准确性。 (C)2015 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号