首页> 外文期刊>EURASIP journal on audio, speech, and music processing >Speaker-dependent model interpolation for statistical emotional speech synthesis
【24h】

Speaker-dependent model interpolation for statistical emotional speech synthesis

机译:基于说话人的模型插值,用于统计情感语音合成

获取原文
           

摘要

In this article, we propose a speaker-dependent model interpolation method for statistical emotional speech synthesis. The basic idea is to combine the neutral model set of the target speaker and an emotional model set selected from a pool of speakers. For model selection and interpolation weight determination, we propose to use a novel monophone-based Mahalanobis distance, which is a proper distance measure between two Hidden Markov Model sets. We design Latin-square evaluation to reduce the systematic bias in the subjective listening tests. The proposed interpolation method achieves sound performance on the emotional expressiveness, the naturalness, and the target speaker similarity. Moreover, such performance is achieved without the need to collect the emotional speech of the target speaker, saving the cost of data collection and labeling.
机译:在本文中,我们提出了一种用于统计情感语音合成的基于说话者的模型插值方法。基本思想是将目标说话者的中性模型集与从说话者池中选择的情感模型集结合起来。对于模型选择和插值权重确定,我们建议使用基于单音素的新颖Mahalanobis距离,这是两个隐马尔可夫模型集之间的适当距离度量。我们设计拉丁方评估以减少主观听力测试中的系统偏见。所提出的插值方法在情感表现力,自然性和目标说话人相似性方面达到了声音表现。此外,无需收集目标说话者的情感言论即可实现这种性能,从而节省了数据收集和标记的成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号