首页> 外文期刊>Computer speech and language >Two-stage intonation modeling using feedforward neural networks for syllable based text-to-speech synthesis
【24h】

Two-stage intonation modeling using feedforward neural networks for syllable based text-to-speech synthesis

机译:使用前馈神经网络的两阶段音调建模,用于基于音节的文本到语音合成

获取原文
获取原文并翻译 | 示例

摘要

This paper proposes a two-stage feedforward neural network (FFNN) based approach for modeling fundamental frequency (F_0) values of a sequence of syllables. In this study, (ⅰ) linguistic constraints represented by positional, contextual and phonological features, (ⅱ) production constraints represented by articulatory features and (ⅲ) linguistic relevance tilt parameters are proposed for predicting intonation patterns. In the first stage, tilt parameters are predicted using linguistic and production constraints. In the second stage, F_0 values of the syllables are predicted using the tilt parameters predicted from the first stage, and basic linguistic and production constraints. The prediction performance of the neural network models is evaluated using objective measures such as average prediction error (μ), standard deviation (σ) and linear correlation coefficient (γX.Y)- The prediction accuracy of the proposed two-stage FFNN model is compared with other statistical models such as Classification and Regression Tree (CART) and Linear Regression (LR) models. The prediction accuracy of the intonation models is also analyzed by conducting listening tests to evaluate the quality of synthesized speech obtained after incorporation of intonation models into the baseline system. From the evaluation, it is observed that prediction accuracy is better for two-stage FFNN models, compared to the other models.
机译:本文提出了一种基于两阶段前馈神经网络(FFNN)的方法,用于对音节序列的基本频率(F_0)值进行建模。在这项研究中,提出了以位置,语境和语音特征为代表的语言约束,以发音特征为代表的生产约束和语言相关性倾斜参数来预测语调模式。在第一阶段,使用语言和生产约束条件来预测倾斜参数。在第二阶段,使用从第一阶段预测的倾斜参数以及基本语言和生产约束条件来预测音节的F_0值。使用诸如平均预测误差(μ),标准偏差(σ)和线性相关系数(γX.Y)等客观指标评估神经网络模型的预测性能-比较拟议的两阶段FFNN模型的预测精度以及其他统计模型,例如分类和回归树(CART)和线性回归(LR)模型。语调模型的预测准确性也可以通过进行听力测试来评估,以评估将语调模型并入基线系统后获得的合成语音的质量。从评估中可以看出,与其他模型相比,两阶段FFNN模型的预测精度更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号