Prediction-based classification for audiovisual discrimination between laughter and speech

机译：基于预测的笑声和语音之间的视听区分

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recent evidence in neuroscience support the theory that prediction of spatial and temporal patterns in the brain plays a key role in human actions and perception. Inspired by these findings, a system that discriminates laughter from speech by modeling the spatial and temporal relationship between audio and visual features is presented. The underlying assumption is that this relationship is different between speech and laughter. Neural networks are trained which learn the audio-to-visual and visual-to-audio feature mapping together with the time evolution of audio and visual features for both classes. Classification of a new frame / sequence is performed via prediction. All the networks produce a prediction of the expected audio / visual features and their prediction errors are combined for each class. The model which best describes the audiovisual feature relationship, i.e., results in the lowest prediction error, provides its label to the input frame / sequence. Using 4 different datasets, the proposed system is compared to standard feature-level fusion on cross-database experiments. In almost all test cases, prediction-based classification outperforms feature-level fusion. Similar conclusion are drawn when adding artificial feature-level noise to the datasets.

机译：神经科学方面的最新证据支持这一理论，即大脑中时空分布的预测在人类行为和感知中起着关键作用。受到这些发现的启发，提出了一种通过对音频和视觉特征之间的时空关系进行建模来将笑声与语音区分开的系统。基本假设是语音和笑声之间的这种关系是不同的。训练了神经网络，它们学习两种课程的视听和视听特征映射以及视听特征的时间演变。通过预测对新帧/序列进行分类。所有网络都会对预期的音频/视频功能进行预测，并针对每个类别组合其预测误差。最好地描述视听特征关系的模型，即导致最低的预测误差的模型，将其标签提供给输入帧/序列。在跨数据库实验中，使用4个不同的数据集，将提出的系统与标准特征级融合进行了比较。在几乎所有测试案例中，基于预测的分类均优于特征级融合。向数据集中添加人工特征级别的噪声时，得出类似的结论。

著录项

来源
《2011 IEEE International Conference on Automatic Face Gesture Recognition and Workshops》|2011年|p.619-626|共8页
会议地点
作者
Petridis Stavros; Pantic Maja; Cohn Jeffrey F.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类模式识别与装置;
关键词

相似文献

外文文献
中文文献
专利

1. Audiovisual Discrimination Between Speech and Laughter: Why and When Visual Information Might Help [J] . Multimedia, IEEE Transactions on . 2011,第2期

机译：言语与笑声之间的视听歧视：为什么以及何时可以提供视觉信息
2. Prediction-Based Audiovisual Fusion for Classification of Non-Linguistic Vocalisations [J] . Stavros Petridis, Maja Pantic Affective Computing, IEEE Transactions on . 2016,第1期

机译：基于预测的视听融合，用于非语言发声分类
3. Visemic processing in audiovisual discrimination of natural speech: A simultaneous fMRI-EEG study [J] . DuboisC., OtzenbergerH., GounotD., Neuropsychologia . 2012,第7期

机译：自然语言视听辨别中的视神经处理：同时进行的fMRI-EEG研究
4. Prediction-based classification for audiovisual discrimination between laughter and speech [C] . Petridis Stavros, Pantic Maja, Cohn Jeffrey F. IEEE International Conference on Automatic Face Gesture Recognition and Workshops . 2011

机译：基于预测的笑声和演讲之间的视听歧视分类
5. The Role of the Motor System in Speech Perception and the Neural Substrates of Audiovisual Speech Integration [D] . Michaelis, Kelly Cecile. 2019

机译：电机系统在语音感知中的作用和视听语音集成的神经基板
6. Lipreading and Audiovisual Speech Recognition across the Adult Lifespan: Implications for Audiovisual Integration [O] . Nancy Tye-Murray, Brent Spehar, Joel Myerson, -1

机译：成人寿命中的唇读和视听语音识别：对视听整合的启示
7. Audiovisual discrimination between speech and laughter: Why and when visual information might help [O] . Petridis, Stavros, Pantic, Maja 2011

机译：语音和笑声之间的视听辨别：为什么以及何时提供视觉信息可能会有所帮助

Prediction-based classification for audiovisual discrimination between laughter and speech

摘要

著录项

相似文献

相关主题

期刊订阅