Although flexible control fo acoustic features is possible in formant-based speech synthesizers, their development requires precise estimation of parameters related to vocal tract and soruce. This requirement is difficult to satisfy and often results in limiting quality of the synthesized speech. The difficulty is derived from the fact that estimation of the parameters is a non-linear problem. Therefore, the completely automatic estimation of the parameters is quite difficult and some approxiamtions or manual modifications of parameters with a priori knowledge are required in the development. In this study, mainly to make the estimation more efficient and/or to assist developers doping the manual modifications of parameters, a formant-based analysis-synthesis system is build. The system introudces pitch-synchronous acoustic analysis to reduce fluctuation of the estimated parameters. Experiments show that quality of synthetic speech fo Japanese /r/ sound is significantly improved by uisng the proposed system.
展开▼
机译:尽管在基于共振峰的语音合成器中可以灵活控制声学特征,但其发展需要精确估计与声道和音源相关的参数。该要求难以满足,并且经常导致合成语音的质量受到限制。困难源于以下事实:对参数的估计是非线性问题。因此,参数的完全自动估计是非常困难的,并且在开发中需要一些具有先验知识的近似或手动修改参数。在这项研究中,主要是为了使估计更加有效和/或帮助开发人员对参数的手动修改进行掺杂,构建了基于共振峰的分析合成系统。该系统引入音高同步声学分析,以减少估计参数的波动。实验表明,通过使用该系统,日语/ r /声音的合成语音质量得到了显着提高。
展开▼