Audiovisual Discrimination Between Speech and Laughter: Why and When Visual Information Might Help

首页> 外文期刊>Multimedia, IEEE Transactions on >Audiovisual Discrimination Between Speech and Laughter: Why and When Visual Information Might Help

【24h】

Audiovisual Discrimination Between Speech and Laughter: Why and When Visual Information Might Help

机译：言语与笑声之间的视听歧视：为什么以及何时可以提供视觉信息

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Past research on automatic laughter classification/detection has focused mainly on audio-based approaches. Here we present an audiovisual approach to distinguishing laughter from speech, and we show that integrating the information from audio and video channels may lead to improved performance over single-modal approaches. Both audio and visual channels consist of two streams (cues), facial expressions and head pose for video and cepstral and prosodic features for audio. Two types of experiments were performed: 1) subject-independent cross-validation on the AMI dataset and 2) cross-database experiments on the AMI and SAL datasets. We experimented with different combinations of cues with the most informative being the combination of facial expressions, cepstral, and prosodic features. Our results suggest that the performance of the audiovisual approach is better on average than single-modal approaches. The addition of visual information produces better results when it comes to female subjects. When the training conditions are less diverse in terms of head movements than the testing conditions (training on the SAL dataset, testing on the AMI dataset), then no improvement was observed with the addition of visual information. On the other hand, when the training conditions are similar (cross validation on the AMI dataset), or more diverse (training on the AMI dataset, testing on the SAL dataset), in terms of head movements than is the case in the testing conditions, an absolute increase of about 3% in the F1 rate for laughter is reported when visual information is added to audio information.

机译：过去有关自动笑声分类/检测的研究主要集中在基于音频的方法上。在这里，我们提出了一种区分笑声和语音的视听方法，并且我们展示了集成来自音频和视频通道的信息可能会比单模方法带来更高的性能。音频和视频通道均包含两个流（提示），视频的面部表情和头部姿势以及音频的倒谱和韵律特征。进行了两种类型的实验：1）在AMI数据集上与受试者无关的交叉验证，以及2）在AMI和SAL数据集上的跨数据库实验。我们尝试了不同的提示组合，其中最有用的是面部表情，倒谱和韵律特征的组合。我们的结果表明，视听方法的性能平均比单模方法更好。对于女性对象，添加视觉信息会产生更好的效果。当训练条件在头部运动方面的差异小于测试条件（在SAL数据集上进行训练，在AMI数据集上进行测试）时，则在添加视觉信息后没有观察到任何改善。另一方面，在头部运动方面，训练条件相似（在AMI数据集上交叉验证）或更多样化（在AMI数据集上训练，在SAL数据集上进行测试）时，与在测试条件下相比，头部运动当将视觉信息添加到音频信息时，据报F1笑声的绝对增加了3％。

著录项

来源
《Multimedia, IEEE Transactions on》 |2011年第2期|p.216-234|共19页
作者

展开▼
作者单位

Department of Computing, Imperial College London, London, UK;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Human behavior analysis; laughter-versus-speech discrimination; neural networks; principal components analysis (PCA);

机译：人的行为分析;笑声对语音的歧视;神经网络;主成分分析（PCA）;

相似文献

外文文献
中文文献
专利

1. Does Audiovisual Speech Offer a Fountain of Youth for Old Ears? An Event-Related Brain Potential Study of Age Differences in Audiovisual Speech Perception [J] . Winneke A.H., Phillips N.A. Psychology and aging . 2011,第2期

机译：视听演讲能否为年轻人提供青春之泉？视听语音感知中年龄差异的事件相关脑潜能研究
2. Visemic processing in audiovisual discrimination of natural speech: A simultaneous fMRI-EEG study [J] . DuboisC., OtzenbergerH., GounotD., Neuropsychologia . 2012,第7期

机译：自然语言视听辨别中的视神经处理：同时进行的fMRI-EEG研究
3. Exploration of Properly Combined Audiovisual Representation with the Entropy Measure in Audiovisual Speech Recognition [J] . Vakhshiteh Fatemeh, Almasganj Farshad Circuits, systems, and signal processing . 2019,第6期

机译：视听语音识别中正确结合视听表示与熵测度的探索
4. Prediction-based classification for audiovisual discrimination between laughter and speech [C] . Petridis Stavros, Pantic Maja, Cohn Jeffrey F. 2011 IEEE International Conference on Automatic Face Gesture Recognition and Workshops . 2011

机译：基于预测的笑声和语音之间的视听区分
5. The Role of the Motor System in Speech Perception and the Neural Substrates of Audiovisual Speech Integration [D] . Michaelis, Kelly Cecile. 2019

机译：电机系统在语音感知中的作用和视听语音集成的神经基板
6. Lipreading and Audiovisual Speech Recognition across the Adult Lifespan: Implications for Audiovisual Integration [O] . Nancy Tye-Murray, Brent Spehar, Joel Myerson, -1

机译：成人寿命中的唇读和视听语音识别：对视听整合的启示
7. Audiovisual discrimination between speech and laughter: Why and when visual information might help [O] . Petridis, Stavros, Pantic, Maja 2011

机译：语音和笑声之间的视听辨别：为什么以及何时提供视觉信息可能会有所帮助

Audiovisual Discrimination Between Speech and Laughter: Why and When Visual Information Might Help

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅