首页> 外文会议>International Conference on speech and computer >A Continuous Vocoder Using Sinusoidal Model for Statistical Parametric Speech Synthesis
【24h】

A Continuous Vocoder Using Sinusoidal Model for Statistical Parametric Speech Synthesis

机译:使用正弦模型的连续声码器用于统计参数语音合成

获取原文

摘要

In our earlier work in statistical parametric speech synthesis, we proposed a source-filter based vocoder using continuous FO (contFO) in combination with Maximum Voiced Frequency (MVF), which was successfully used with deep learning. The advantage of a continuous vocoder in this scenario is that vocoder parameters are simpler to model than conventional vocoders with discontinuous FO. However, our vocoder lacks some degree of naturalness and still not achieving a high-quality speech synthesis compared to the well-known vocoders (e.g. STRAIGHT or WORLD). Previous studies have shown that human voice can be modelled effectively as a sum of sinusoids. In this paper, we firstly address the design of a continuous vocoder using sinusoidal synthesis model that is applicable in statistical frameworks. The same three parameters of the analysis part from our previous model have been also extracted and used for this study. For refining the output of the contFO estimation, post-processing approach is utilized to reduce the unwanted voiced component of unvoiced speech sounds, resulting in a smoother contFO track. During synthesis, a sinusoidal model with minimum phase is applied to reconstruct speech. Finally, we have compared the voice quality of the proposed system to the STRAIGHT and WORLD vocoders. Experimental results from objective and subjective evaluations have shown that the proposed vocoder gives state-of-the-art vocoders performance in synthesized speech while outperforming the previous work of our continuous FO based source-filter vocoder.
机译:在我们早期的统计参量语音合成工作中,我们提出了一种基于源滤波器的声码器,它结合了连续FO(contFO)和最大语音频率(MVF),已成功用于深度学习。在这种情况下,连续声码器的优势在于,与具有不连续FO的常规声码器相比,声码器参数更易于建模。但是,与众所周知的声码器(例如STRAIGHT或WORLD)相比,我们的声码器缺乏某种程度的自然性,并且仍无法实现高质量的语音合成。先前的研究表明,人声可以有效地建模为正弦波的总和。在本文中,我们首先解决使用正弦综合模型的连续声码器的设计,该模型可用于统计框架。我们先前模型中分析部分的相同三个参数也已提取并用于本研究。为了完善con​​tFO估计的输出,采用了后处理方法来减少清音语音中不需要的浊音成分,从而使contFO音轨更平滑。在合成过程中,将具有最小相位的正弦模型应用于语音重建。最后,我们将建议系统的语音质量与STRAIGHT和WORLD声码器进行了比较。来自主观和主观评估的实验结果表明,所提出的声码器在合成语音中提供了最先进的声码器性能,同时胜过了我们基于FO的连续源声码器声码器的先前工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号