首页> 外文会议>17th IEEE International Conference on Image Processing >Robust visual features for the multimodal identification of unregistered speakers in TV talk-shows

【24h】

Robust visual features for the multimodal identification of unregistered speakers in TV talk-shows

机译：强大的视觉功能可用于电视脱口秀节目中未注册演讲者的多模式识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we propose a novel multimodal method for identifying unregistered speakers in a TV talk-show using a semi-supervised learning approach based on Support Vector Machines. Our study highlights the fact that specific visual features prove to be very efficient for this particular type of video content which is edited from multi-camera recordings. These visual features, motivated by prior knowledge on the approach followed by the TV director in choosing the appropriate shots, are found to bring a significant improvement in identification accuracy when used together with classic audio Mel-frequency cepstral coefficients (+8% compared to various baseline systems, in particular a standard audio only system).

机译：在本文中，我们提出了一种新的多模式方法，该方法使用基于支持向量机的半监督学习方法来识别电视脱口秀节目中未注册的说话者。我们的研究强调了这样一个事实，即特定的视觉功能对于从多机位录像中编辑的这种特定类型的视频内容非常有效。这些视觉功能是由电视导演在选择合适的镜头时所采用的方法的先验知识所激发的，与经典音频梅尔频率倒谱系数一起使用时，识别精度显着提高（与各种音频相比，提高了8％）基准系统，尤其是标准的纯音频系统）。

著录项

来源
《17th IEEE International Conference on Image Processing 》|2010年|p.1469-1472|共4页
会议地点
作者
Vallet F.; Essid S.; Carrive J.; Richard G.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.41;
关键词
image analysis; multimedia databases; multimedia systems; pattern classification;

机译：图像分析;多媒体数据库;多媒体系统;模式分类;

相似文献

外文文献
中文文献
专利

1. A Multimodal Approach to Speaker Diarization on TV Talk-Shows [J] . Vallet F., Essid S., Carrive J. Multimedia, IEEE Transactions on . 2013 ,第3期

机译：电视脱口秀中说话人差异化的一种多模式方法
2. Multimodal (audiovisual) source separation exploiting multi-speaker tracking, robust beamforming and time-frequency masking [J] . Naqvi S.M., Wang W., Khan M.S., Signal Processing, IET . 2012 ,第5期

机译：利用多扬声器跟踪，强大的波束形成和时频掩蔽的多模式（视听）源分离
3. A Visual Signal Reliability for Robust Audio-Visual Speaker Identification [J] . Md. TARIQUZZAMAN, Jin Young KIM, Seung You NA, IEICE transactions on information and systems . 2011 ,第10期

机译：可靠的视听扬声器识别的视觉信号可靠性
4. ROBUST VISUAL FEATURES FOR THE MULTIMODAL IDENTIFICATIONOF UNREGISTERED SPEAKERS IN TV TALK-SHOWS [C] . Felicien Vallet, Slim Essid, Jean Carrive, IEEE International Conference on Image Processing . 2010

机译：强大的视觉功能，用于电视谈话中未注册扬声器的多式联算识别
5. Robust features for speaker identification. [D] . Assaleh, Khaled Talal. 1993

机译：强大的扬声器识别功能。
6. Multimodal Speaker Diarization Using a Pre-Trained Audio-Visual Synchronization Model [O] . Rehan Ahmad, Syed Zubair, Hani Alquhayz, 2019

机译：使用预训练的视听同步模型进行多模态扬声器二分法
7. Deep complementary features for speaker identification in TV broadcast data [O] . Budnik, Mateusz, Besacier, Laurent, Khodabakhsh, Ali, 2016

机译：电视广播数据中说话人识别的深层补充功能
8. Integrated Feature Normalization and Enhancement for Robust Speaker Recognition Using Acoustic Factor Analysis (Preprint). [R] . Hasan, T., Hansen, J. H. 2012

机译：使用声学因子分析（预印本）进行稳健的说话人识别的集成特征归一化和增强。

Robust visual features for the multimodal identification of unregistered speakers in TV talk-shows

摘要

著录项

相似文献

相关主题

期刊订阅