Robust face-voice based speaker identity verification using multilevel fusion

Girija Chetty; Michael Wagner

首页> 外文期刊>Image and Vision Computing >Robust face-voice based speaker identity verification using multilevel fusion

【24h】

Robust face-voice based speaker identity verification using multilevel fusion

机译：使用多级融合的基于面部声音的健壮说话人身份验证

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose a robust multilevel fusion strategy involving cascaded multimodal fusion of audio-lip-face motion, correlation and depth features for biometric person authentication. The proposed approach combines the information from different audio-video based modules, namely: audio-lip motion module, audio-lip correlation module, 2D + 3D motion-depth fusion module, and performs a hybrid cascaded fusion in an automatic, unsupervised and adaptive manner, by adapting to the local performance of each module. This is done by taking the output-score based reliability estimates (confidence measures) of each of the module into account. The module weightings are determined automatically such that the reliability measure of the combined scores is maximised. To test the robustness of the proposed approach, the audio and visual speech (mouth) modalities are degraded to emulate various levels of train/ test mismatch; employing additive white Gaussian noise for the audio and JPEG compression for the video signals. The results show improved fusion performance for a range of tested levels of audio and video degradation, compared to the individual module performances. Experiments on a 3D stereovision database AVOZES show that, at severe levels of audio and video mismatch, the audio, mouth, 3D face, and tri-module (audio-lip motion, correlation and depth) fusion EERs were 42.9%, 32%, 15%, and 7.3%, respectively, for biometric person authentication task.

机译：在本文中，我们提出了一种健壮的多级融合策略，该方案包括级联多模态融合的音频面部表情运动，相关性和深度特征，以进行生物特征识别。所提出的方法结合了来自不同基于音频视频的模块的信息，即：音频嘴唇运动模块，音频嘴唇相关模块，2D + 3D运动深度融合模块，并在自动，无监督和自适应的情况下执行混合级联融合方式，通过适应每个模块的本地性能。这是通过考虑每个模块的基于输出分数的可靠性估计（置信度）来完成的。自动确定模块权重，以使组合分数的可靠性度量最大化。为了测试所提出方法的鲁棒性，将音频和视频语音（嘴）模态降级以模拟各种级别的火车/测试失配；对音频使用加性高斯白噪声，对视频信号采用JPEG压缩。结果显示，与单个模块的性能相比，在一系列测试的音频和视频降级水平下，融合性能得到了改善。在3D立体视觉数据库AVOZES上进行的实验表明，在严重的音频和视频不匹配情况下，音频，嘴巴，3D脸部和三模块（音频-嘴唇运动，相关性和深度）融合EER分别为42.9％，32％，生物特征识别人员身份验证任务分别为15％和7.3％。

著录项

来源
《Image and Vision Computing》 |2008年第9期|p.1249-1260|共12页
作者
Girija Chetty; Michael Wagner;
展开▼
作者单位

Human Computer Communication Laboratory, School of Information Sciences and Engineering, University of Canberra, LPO Box 340, Building 11, UC, Bruce, ACT 2601, Australia;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
lip; 3D face; voice; biometric; identity verification; robust; multilevel fusion;

机译：嘴唇;3D脸;声音;生物特征;身份验证;稳健;多层次融合;
入库时间 2022-08-18 02:49:11

相似文献

外文文献
中文文献
专利

1. Manifold learning based speaker dependent dimension reduction for robust text independent speaker verification [J] . Davood Zabihzadeh, Mohammad H. Moattar International journal of speech technology . 2014,第3期

机译：基于流形学习的说话人相关维度减少功能，可实现可靠的文本独立说话人验证
2. Noise robust speaker verification via the fusion of SNR-independent and SNR-dependent PLDA [J] . Xiaomin Pang, Man-Wai Mak International journal of speech technology . 2015,第4期

机译：通过融合独立于SNR和依赖于SNR的PLDA的抗噪声扬声器验证
3. Identity Authentication Scheme Expansion Based on Speaker Verification [J] . Geng Zhao1, Xufei Li2 Journal of Computers . 2013,第8期

机译：None
4. A Multi-fusion Acoustic Feature Generation Method for Identity Authentication Based on Speaker Verification [C] . Jiaxin Wu, Bing Li, Yazhou Wang, IEEE International Conference on Signal and Image Processing Applications . 2020

机译：基于扬声器验证的标识认证的多融合声学特征生成方法
5. Speaker adaptation in joint factor analysis based text independent speaker verification [D] . Shou-Chun, Yin 2007

机译：基于联合因素分析的文本自适应说话人验证中的说话人适应
6. Speaker verification based on the fusion of speech acoustics and inverted articulatory signals [O] . Ming Li, Jangwon Kim, Adam Lammert, -1

机译：基于语音声学和反向发音信号融合的说话人验证
7. Robust speaker verification from GSM-transcoded speech based on decision fusion and feature transformation [O] . Mak, MW, Cheung, MC, Kung, SY 2003

机译：基于决策融合和特征转换的GSM转码语音的可靠说话人验证
8. Identity Verification Through the Fusion of Face and Speaker Recognition [R] . Keller, J. G. 1993

机译：通过人脸和说话人识别的融合进行身份验证

Robust face-voice based speaker identity verification using multilevel fusion

摘要

著录项

相似文献

相关主题

期刊订阅