首页> 外文会议>International Conference on Affective Computing and Intelligent Interaction >Duration Refinement for Hybrid Speech Synthesis System using Random Forest
【24h】

Duration Refinement for Hybrid Speech Synthesis System using Random Forest

机译:随机林混合语音合成系统的持续时间细化

获取原文

摘要

The hybrid speech synthesis system which combines the hidden Markov model and unit selection method has been widely used and researched in both industry and academia recently due to its naturalness and expressiveness. However, the target duration, which is used to control the duration of selected candidate, is still predicted via the state-based duration model, whose performance is far from satisfactory. As a result, the synthetic speech sounds somewhat bland and even tedious. In this paper, we replace the state-based duration model with Random Forest (RF). Experiments on English database show that the new model yields more accurate predictions, compared with the baseline state-based duration model. The average improvement of phone RMSEs are 4.265 ms and 14.6% in English speech synthesis. The perceptual experiments on the same database further confirm that proposed model have a better performance than the baseline model.
机译:结合隐马尔可夫模型和单位选择方法的混合语音合成系统已被广泛应用,并在最近在行业和学术界研究,因为其自然和表现力。然而,用于控制所选候选者的持续时间的目标持续时间仍然通过基于状态的持续时间模型来预测,其性能远非令人满意。结果,合成语音听起来有点平淡,甚至乏味。在本文中,我们用随机林(RF)取代了基于国家的持续时间模型。与基于基线状态的持续时间模型相比,英语数据库的实验表明,新模型会产生更准确的预测。英语语音合成的电话RMSE的平均改善为4.265毫秒和14.6%。同一数据库的感知实验进一步确认提出的模型具有比基线模型更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号