首页> 外文会议> >A Statistical Approach for Modeling Prosody Features using POS Tags for Emotional Speech Synthesis
【24h】

A Statistical Approach for Modeling Prosody Features using POS Tags for Emotional Speech Synthesis

机译:一种使用POS标签进行情感语音合成的韵律特征建模的统计方法

获取原文

摘要

Deriving statistical models for emotional speech processing is a challenging problem because of the highly varying nature of emotion expressions. We address this problem by modeling prosodic parameter differences at the part of speech (POS) level for emotional utterances for the purpose of emotional speech synthesis. Synthesis at the POS level is appealing because POS tags carry salient information conveying speech prominence. Analysis of energy, duration and F0 differences between matching neutral-angry, neutral-sad and neutral-happy emotional utterance pairs shows that Gaussian distributions can be used to model the parameter differences. Pairwise comparisons of POS features reveal that it is more probable that the normalized mean and median energy of sad POS tags are larger than neutral, angry or happy POS tags. They also show that for particular tags it is more likely that angry emotion has higher F0 median than happy emotion, and that sad emotion has higher F0 median than neutral emotion. Experiments of conversion of neutral speech into emotional speech using the Gaussian probability functions provide helpful insights into the application of statistical models in speech synthesis
机译:由于情感表达的性质各不相同,因此得出用于情感语音处理的统计模型是一个具有挑战性的问题。我们通过在语音(POS)级别上针对情感话语建模韵律参数差异来解决此问题,以达到情感语音合成的目的。 POS级别的合成很吸引人,因为POS标签带有传达语音突出的显着信息。对匹配的中性愤怒,中性悲伤和中性快乐言语对之间的能量,持续时间和F0差异的分析表明,高斯分布可用于建模参数差异。 POS功能的成对比较显示,悲伤POS标签的标准化均值和中值能量比中性,愤怒或快乐POS标签更大的可能性。他们还表明,对于特定的标签,生气的情绪比幸福的情绪具有更高的F0中位数,而悲伤的情绪比中立的情绪具有更高的F0中位数。使用高斯概率函数将中性语音转换为情感语音的实验为统计模型在语音合成中的应用提供了有益的见解

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号