Audiovisual Synchrony Detection with Optimized Audio Features

机译：具有优化音频功能的视听同步检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Audiovisual speech synchrony detection is an important part of talking-face verification systems. Prior work has primarily focused on visual features and joint-space models, while standard mel-frequency cepstral coefficients (MFCCs) have been commonly used to present speech. We focus more closely on audio by studying the impact of context window length for delta feature computation and comparing MFCCs with simpler energy-based features in lip-sync detection. We select state-of-the-art hand-crafted lip-sync visual features, space-time auto-correlation of gradients (STACOG), and canonical correlation analysis (CCA), for joint-space modeling. To enhance joint space modeling, we adopt deep CCA (DCCA), a nonlinear extension of CCA. Our results on the XM2VTS data indicate substantially enhanced audiovisual speech synchrony detection, with an equal error rate (EER) of 3.68%. Further analysis reveals that failed lip region localization and beardedness of the subjects constitutes most of the errors. Thus, the lip motion description is the bottleneck, while the use of novel audio features or joint-modeling techniques is unlikely to boost lip-sync detection accuracy further.

机译：视听语音同步检测是说话人面部验证系统的重要组成部分。先前的工作主要集中在视觉特征和关节空间模型上，而标准的梅尔频率倒谱系数（MFCC）通常用于呈现语音。通过研究上下文窗口长度对增量特征计算的影响，并在口形同步检测中将MFCC与更简单的基于能量的特征进行比较，我们将更加专注于音频。我们选择最先进的手工制作的唇形同步视觉特征，梯度的时空自相关（STACOG）和规范相关分析（CCA），用于联合空间建模。为了增强关节空间建模，我们采用了深度CCA（DCCA），这是CCA的非线性扩展。我们在XM2VTS数据上的结果表明，视听语音同步检测得到了显着增强，相等错误率（EER）为3.68％。进一步的分析表明，失败的嘴唇区域定位和对象的胡须构成了大多数错误。因此，唇部运动描述是瓶颈，而使用新颖的音频功能或关节建模技术则不太可能进一步提高唇部同步检测的准确性。

著录项

来源
《IEEE International Conference on Signal and Image Processing》|2018年|377-381|共5页
会议地点 Shenzhen(CN)
作者
Sami Sieranoja; Md Sahidullah; Tomi Kinnunen; Jukka Komulainen; Abdenour Hadid;
展开▼
作者单位

School of Computing University of Eastern Finland Joensuu Finland;

Center for Machine Vision and Signal Analysis (C;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Feature extraction; Mel frequency cepstral coefficient; Visualization; Correlation; Microsoft Windows; Lips; Face;

机译：特征提取;梅尔频率倒谱系数；可视化；相关性微软Windows;嘴唇;面对;

相似文献

外文文献
中文文献
专利

1. Poorer Resolution for Audiotactile Than for Audiovisual Synchrony Detection in Cluttered Displays [J] . Orchard-Mills Emily, Van der Burg Erik, Alais David Journal of experimental psychology. human perception and performance . 2016,第7期

机译：比杂乱显示中的视听同步分辨率差的视听同步分辨率差
2. Audiovisual synchrony assessment for replay attack detection in talking face biometrics [J] . Boutellaa Elhocine, Boulkenafet Zinelabidine, Komulainen Jukka, Multimedia Tools and Applications . 2016,第9期

机译：视听同步评估，用于对说话人面部识别进行重放攻击检测
3. EEG GAMMA POWER CORRELATES OF DEVELOPMENTAL CHANGE IN AUDIOVISUAL SYNCHRONY DETECTION [J] . Smith Elizabeth, Mash Clay, Thurm Audrey, Psychophysiology . 2016,第S1期

机译：脑电伽玛功率与视听同步检测发展变化的相关性
4. Audiovisual Synchrony Detection with Optimized Audio Features [C] . Sami Sieranoja, Md Sahidullah, Tomi Kinnunen, IEEE International Conference on Signal and Image Processing . 2018

机译：Audiovisual同步检测，具有优化的音频功能
5. Advances in Audiovisual Speech Processing for Robust Voice Activity Detection and Automatic Speech Recognition [D] . Tao, Fei. 2018

机译：用于鲁棒语音活动检测和自动语音识别的视听语音处理方面的进展
6. Neural Correlates of Temporal Complexity and Synchrony during Audiovisual Correspondence Detection [O] . Oliver Baumann, Joyce M. G. Vromen, Allen Cheung, 2018

机译：视听对应检测过程中时间复杂性和同步性的神经相关
7. Audiovisual Synchrony Detection with Optimized Audio Features [O] . Sami Sieranoja, Md Sahidullah, Tomi Kinnunen, 2018

机译：Audiovisual同步检测，具有优化的音频功能

Audiovisual Synchrony Detection with Optimized Audio Features

摘要

著录项

相似文献

相关主题

期刊订阅