首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Vocal Tract Length Normalization for Statistical Parametric Speech Synthesis
【24h】

Vocal Tract Length Normalization for Statistical Parametric Speech Synthesis

机译:统计参数语音合成的人声道长度归一化

获取原文
获取原文并翻译 | 示例

摘要

Vocal tract length normalization (VTLN) has been successfully used in automatic speech recognition for improved performance. The same technique can be implemented in statistical parametric speech synthesis for rapid speaker adaptation during synthesis. This paper presents an efficient implementation of VTLN using expectation maximization and addresses the key challenges faced in implementing VTLN for synthesis. Jacobian normalization, high-dimensionality features and truncation of the transformation matrix are a few challenges presented with the appropriate solutions. Detailed evaluations are performed to estimate the most suitable technique for using VTLN in speech synthesis. Evaluating VTLN in the framework of speech synthesis is also not an easy task since the technique does not work equally well for all speakers. Speakers have been selected based on different objective and subjective criteria to demonstrate the difference between systems. The best method for implementing VTLN is confirmed to be use of the lower order features for estimating warping factors.
机译:声道长度归一化(VTLN)已成功用于自动语音识别中,以提高性能。可以在统计参数语音合成中实现相同的技术,以在合成过程中快速实现说话人自适应。本文介绍了使用期望最大化的VTLN的有效实现,并解决了实现VTLN进行综合所面临的关键挑战。雅可比规范化,高维特征和变换矩阵的截断是使用适当解决方案提出的一些挑战。执行详细评估以估计在语音合成中使用VTLN的最合适技术。在语音合成框架中评估VTLN也不是一件容易的事,因为该技术并非对所有说话者都有效。已根据不同的客观和主观标准选择了发言人,以证明系统之间的差异。确认实现VTLN的最佳方法是使用低阶功能来估计翘曲因子。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号