首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Multiview Supervised Dictionary Learning in Speech Emotion Recognition
【24h】

Multiview Supervised Dictionary Learning in Speech Emotion Recognition

机译:语音情感识别中的多视图监督词典学习

获取原文
获取原文并翻译 | 示例

摘要

Recently, a supervised dictionary learning (SDL) approach based on the Hilbert-Schmidt independence criterion (HSIC) has been proposed that learns the dictionary and the corresponding sparse coefficients in a space where the dependency between the data and the corresponding labels is maximized. In this paper, two multiview dictionary learning techniques are proposed based on this HSIC-based SDL. While one of these two techniques learns one dictionary and the corresponding coefficients in the space of fused features in all views, the other learns one dictionary in each view and subsequently fuses the sparse coefficients in the spaces of learned dictionaries. The effectiveness of the proposed multiview learning techniques in using the complementary information of single views is demonstrated in the application of speech emotion recognition (SER). The fully-continuous sub-challenge (FCSC) of the AVEC 2012 dataset is used in two different views: baseline and spectral energy distribution (SED) feature sets. Four dimensional affects, i.e., arousal, expectation, power, and valence are predicted using the proposed multiview methods as the continuous response variables. The results are compared with the single views, AVEC 2012 baseline system, and also other supervised and unsupervised multiview learning approaches in the literature. Using correlation coefficient as the performance measure in predicting the continuous dimensional affects, it is shown that the proposed approach achieves the highest performance among the rivals. The relative performance of the two proposed multiview techniques and their relationship are also discussed. Particularly, it is shown that by providing an additional constraint on the dictionary of one of these approaches, it becomes the same as the other.
机译:最近,已经提出了一种基于希尔伯特-施密特独立性准则(HSIC)的监督词典学习(SDL)方法,该方法可在数据与相应标签之间的相关性最大化的空间中学习词典和相应的稀疏系数。本文基于基于HSIC的SDL提出了两种多视图词典学习技术。这两种技术中的一种在所有视图中的融合特征空间中学习一个字典和相应的系数,而另一种技术在每个视图中学习一个字典,随后将学习字典的空间中的稀疏系数融合在一起。语音情感识别(SER)的应用证明了所提出的多视图学习技术在使用单视图补充信息中的有效性。 AVEC 2012数据集的全连续子挑战(FCSC)用于两个不同的视图:基线和光谱能量分布(SED)功能集。使用所提出的多视图方法作为连续响应变量,可以预测到四个维度的影响,即唤醒,期望,力量和化合价。将结果与单视图,AVEC 2012基准系统以及文献中其他受监督和无监督的多视图学习方法进行比较。使用相关系数作为预测连续尺寸影响的性能指标,表明所提出的方法在竞争对手中获得了最高的性能。还讨论了两种提议的多视图技术的相对性能及其关系。特别地,示出了通过对这些方法之一的字典提供附加约束,它变得与另一方法相同。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号