Prediction-Based Audiovisual Fusion for Classification of Non-Linguistic Vocalisations

Stavros Petridis; Maja Pantic

首页> 外文期刊>Affective Computing, IEEE Transactions on >Prediction-Based Audiovisual Fusion for Classification of Non-Linguistic Vocalisations

【24h】

Prediction-Based Audiovisual Fusion for Classification of Non-Linguistic Vocalisations

机译：基于预测的视听融合，用于非语言发声分类

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Prediction plays a key role in recent computational models of the brain and it has been suggested that the brain constantly makes multisensory spatiotemporal predictions. Inspired by these findings we tackle the problem of audiovisual fusion from a new perspective based on prediction. We train predictive models which model the spatiotemporal relationship between audio and visual features by learning the audio-to-visual and visual-to-audio feature mapping for each class. Similarly, we train predictive models which model the time evolution of audio and visual features by learning the past-to-future feature mapping for each class. In classification, all the class-specific regression models produce a prediction of the expected audio/visual features and their prediction errors are combined for each class. The set of class-specific regressors which best describes the audiovisual feature relationship, i.e., results in the lowest prediction error, is chosen to label the input frame. We perform cross-database experiments, using the AMI, SAL, and MAHNOB databases, in order to classify laughter and speech and subject-independent experiments on the AVIC database in order to classify laughter, hesitation and consent. In virtually all cases prediction-based audiovisual fusion consistently outperforms the two most commonly used fusion approaches, decision-level and feature-level fusion.

机译：预测在最近的大脑计算模型中起着关键作用，并且有人建议大脑不断做出多感觉的时空预测。受到这些发现的启发，我们从基于预测的新角度解决了视听融合问题。我们通过学习每个类别的视听和视听音频特征映射来训练预测模型，该模型对视听特征之间的时空关系建模。同样，我们通过学习每个类的过去到将来的特征映射来训练预测模型，这些模型对音频和视觉特征的时间演变进行建模。在分类中，所有特定于类别的回归模型都会对预期的音频/视频特征进行预测，并针对每个类别组合其预测误差。选择最能描述视听特征关系（即，导致最低的预测误差）的特定类回归器的集合来标记输入帧。我们使用AMI，SAL和MAHNOB数据库执行跨数据库实验，以便对笑声和语音进行分类，并在中航工业数据库上进行与受试者无关的实验，以便对笑声，犹豫和同意进行分类。在几乎所有情况下，基于预测的视听融合始终优于两种最常用的融合方法：决策级和特征级融合。

著录项

来源
《Affective Computing, IEEE Transactions on》 |2016年第1期|45-58|共14页
作者
Stavros Petridis; Maja Pantic;
展开▼
作者单位

Stavros Petridis is with the Department of Computing, Imperial College London, London, UK (e-mail: stavros.petridis04@imperial.ac.uk);

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Audiovisual Fusion; Nonlinguistic Vocalisation Classification; Prediction-based Fusion; Prediction-based fusion; audiovisual fusion; nonlinguistic vocalisation classification;

机译：视听融合;非语言发声分类;基于预测的融合;基于预测的融合;视听融合;非语言发声分类;

相似文献

外文文献
中文文献
专利

1. Listeners can extract meaning from non-linguistic infant vocalisations cross-culturally [J] . Verena Kersken, Klaus Zuberbühler, Juan-Carlos Gomez Scientific reports. . 2017,第1期

机译：听众可以从非语言婴儿发声中提取杂于文化的意义
2. Non-vocalised Arabic word classifications based on mining affixes features [J] . Sari Awwad, Mustafa Hammad, Safaa Al-Haj Saleh International Journal of Computer Applications in Technology . 2019,第4期

机译：基于挖掘附件功能的非发声阿拉伯语词分类
3. Livestock vocalisation classification in farm soundscapes [J] . Bishop James C., Falzon Greg, Trotter Mark, Computers and Electronics in Agriculture . 2019,第期

机译：牲畜声毒分类在农场出院
4. Prediction-based classification for audiovisual discrimination between laughter and speech [C] . Petridis Stavros, Pantic Maja, Cohn Jeffrey F. 2011 IEEE International Conference on Automatic Face Gesture Recognition and Workshops . 2011

机译：基于预测的笑声和语音之间的视听区分
5. Improving Speech-Related Facial Action Unit Recognition by Audiovisual Information Fusion [D] . Meng, Zibo. 2018

机译：视听信息融合改善与语音相关的面部动作单元识别
6. Listeners can extract meaning from non-linguistic infant vocalisations cross-culturally [O] . Verena Kersken, Klaus Zuberbühler, Juan-Carlos Gomez -1

机译：听众可以从跨文化的非语言婴儿语音中提取含义
7. Prediction-based Audiovisual Fusion for Classification of Non-Linguistic Vocalisations [O] . Petridis, Stavros, Pantic, Maja 2016

机译：基于预测的视听融合，用于非语言发声分类
8. Preliminary Investigation Into the Impact of Audiovisual Synchronization of Impaired Audiovisual Sequences. [R] . Pinson, M. H., Webster, A., Ingram, W. 2011

机译：视听序列受损视听同步影响的初步研究。

Prediction-Based Audiovisual Fusion for Classification of Non-Linguistic Vocalisations

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅