首页> 外文期刊>Cognitive computation >Time-Scale Feature Extractions for Emotional Speech Characterization
【24h】

Time-Scale Feature Extractions for Emotional Speech Characterization

机译:用于情感语音表征的时标特征提取

获取原文
获取原文并翻译 | 示例

摘要

Emotional speech characterization is an important issue for the understanding of interaction. This article discusses the time-scale analysis problem in feature extraction for emotional speech processing. We describe a computational framework for combining segmental and supra-segmental features for emotional speech detection. The statistical fusion is based on the estimation of local a posteriori class probabilities and the overall decision employs weighting factors directly related to the duration of the individual speech segments. This strategy is applied to a real-world application: detection of Italian motherese in authentic and longitudinal parent-infant interaction at home. The results suggest that short- and long-term information, respectively, represented by the short-term spectrum and the prosody parameters (fundamental frequency and energy) provide a robust and efficient time-scale analysis. A similar fusion methodology is also investigated by the use of a phonetic-specific characterization process. This strategy is motivated by the fact that there are variations across emotional states at the phoneme level. A time-scale based on both vowels and consonants is proposed and it provides a relevant and discriminant feature space for acted emotion recognition. The experimental results on two different databases Berlin (German) and Aholab (Basque) show that the best performance are obtained by our phoneme-dependent approach. These findings demonstrate the relevance of taking into account phoneme dependency (vowels/consonants) for emotional speech characterization.
机译:情感言语表征是理解交互作用的重要问题。本文讨论了用于情感语音处理的特征提取中的时标分析问题。我们描述了一种组合语音情感检测的分段和超分段特征的计算框架。统计融合基于局部后验类概率的估计,并且总体决策采用与各个语音片段的持续时间直接相关的加权因子。该策略适用于实际应用:在家中通过真实的纵向父母与婴儿互动来检测意大利母亲。结果表明,分别由短期频谱和韵律参数(基本频率和能量)表示的短期和长期信息提供了可靠而有效的时标分析。还通过使用语音特定的表征过程来研究类似的融合方法。该策略的动机是,音素级别的情绪状态存在差异。提出了一种基于元音和辅音的时标,它为行为情感识别提供了一个相关且可区分的特征空间。在两个不同的数据库Berlin(德语)和Aholab(巴斯克)上的实验结果表明,通过依赖音素的方法可获得最佳性能。这些发现证明了在情感语音表征中考虑音素依赖性(元音/辅音)的相关性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号