Audio-Visual Speaker Identification via Adaptive Fusion Using Reliability Estimates of Both Modalities

机译：使用两种模态的可靠性估计通过自适应融合进行视听说话人识别

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

An audio-visual speaker identification system is described, where the audio and visual speech modalities are fused by an automatic unsupervised process that adapts to local classifier performance, by taking into account the output score based reliability estimates of both modalities. Previously reported methods do not consider that both the audio and the visual modalities can be degraded. The visual modality uses the speakers lip information. To test the robustness of the system, the audio and visual modalities are degraded to emulate various levels of train/test mismatch; employing additive white Gaussian noise for the audio and JPEG compression for the visual signals. Experiments are carried out on a large augmented data set from the XM2VTS database. The results show improved audio-visual accuracies at all tested levels of audio and visual degradation, compared to the individual audio or visual modality accuracies. For high mismatch levels, the audio, visual, and auto-adapted audio-visual accuracies are 37.1%, 48%, and 71.4% respectively.

机译：描述了一种视听说话者识别系统，其中通过考虑基于两种模态的基于输出得分的可靠性估计，通过适应于本地分类器性能的自动无监督过程来融合视听语音模态。先前报道的方法没有考虑到音频和视觉模态都可能降低。视觉模态使用说话者的嘴唇信息。为了测试系统的健壮性，会降低音频和视频的模态以模拟各种级别的训练/测试不匹配；对音频采用加性高斯白噪声，对视觉信号采用JPEG压缩。对来自XM2VTS数据库的大型扩充数据集进行了实验。结果表明，与单个音频或视觉模态精度相比，在所有测试的音频和视频降级水平上，音频-视频精度都有所提高。对于高不匹配度，音频，视觉和自动调整的视听精度分别为37.1％，48％和71.4％。

著录项

来源
《International Conference on Audio- and Video-Based Biometric Person Authentication(AVBPA 2005); 20050720-22; Hilton Rye Town,NY(US)》|2005年|P.787-796|共10页
会议地点 Hilton Rye TownNY(US)
作者
Niall A. Fox; Brian A. OMullane; Richard B. Reilly;
展开▼
作者单位

Dept. of Electronic and Electrical Engineering, University College Dublin, Belfield, Dublin 4, Ireland;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类信息处理（信息加工）;
关键词

相似文献

外文文献
中文文献
专利

1. Multimodal speaker identification using an adaptive classifier cascade based on modality reliability [J] . Engin Erzin, Yemez Y., Tekalp A.M. IEEE transactions on multimedia . 2005,第5期

机译：基于模态可靠性的自适应分类器级联的多模态说话人识别
2. A Visual Signal Reliability for Robust Audio-Visual Speaker Identification [J] . Md. TARIQUZZAMAN, Jin Young KIM, Seung You NA, IEICE transactions on information and systems . 2011,第10期

机译：可靠的视听扬声器识别的视觉信号可靠性
3. A Visual Signal Reliability for Robust Audio-Visual Speaker Identification [J] . Md. TAMQUZZAMAN, Jin Young KIM, Seung You NA, IEICE Transactions on Information and Systems . 2011,第10期

机译：视觉信号的可靠性，可实现可靠的视听扬声器识别
4. INFORMATION FUSION AND DECISION CASCADING FOR AUDIO-VISUAL SPEAKER RECOGNITION BASED ON TIME-VARYING STREAM RELIABILITY PREDICTION [C] . Upendra V. Chaudhari, Ganesh N. Ramaswamy, Gerasimos Potamianos, International Conference on Multimedia and Expo . 2003

机译：基于时变流可靠性预测的视听扬声器识别信息融合与决策级联
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. Audio-Visual Causality and Stimulus Reliability Affect Audio-Visual Synchrony Perception [O] . Shao Li, Qi Ding, Yichen Yuan, 2021

机译：视听因果关系和刺激可靠性会影响视听同步的感知
7. Multimodal speaker identification using an adaptive classifier cascade based on modality reliability [O] . Engin Erzin, Yücel Yemez, A. Murat, 2005

机译：基于模态可靠性的自适应分类器级联多模态说话人识别

Audio-Visual Speaker Identification via Adaptive Fusion Using Reliability Estimates of Both Modalities

摘要

著录项

相似文献

相关主题

期刊订阅