【24h】

VALID: A New Practical Audio-Visual Database, and Comparative Results

机译:有效:新的实用视听数据库和比较结果

获取原文
获取原文并翻译 | 示例

摘要

The performance of deployed audio, face, and multi-modal person recognition systems in non-controlled scenarios, is typically lower than systems developed in highly controlled environments. With the aim to facilitate the development of robust audio, face, and multi-modal person recognition systems, the new large and realistic multi-modal (audio-visual) VALID database was acquired in a noisy "real world" office scenario with no control on illumination or acoustic noise. In this paper we describe the acquisition and content of the VALID database, consisting of five recording sessions of 106 subjects over a period of one month. Speaker identification experiments using visual speech features extracted from the mouth region are reported. The performance based on the uncontrolled VALID database is compared with that of the controlled XM2VTS database. The best VALID and XM2VTS based accuracies are 63.21% and 97.17% respectively. This highlights the degrading effect of an uncontrolled illumination environment and the importance of this database for deploying real world applications. The VALID database is available to the academic community through http://ee.ucd.ie/validdb/.
机译:在非受控场景中,已部署的音频,面部和多模式人员识别系统的性能通常低于在高度受控的环境中开发的系统的性能。为了促进健壮的音频,面部和多模式人识别系统的开发,在嘈杂的“真实世界”办公场景中,在无人控制的情况下,获取了新的大型且逼真的多模式(视听)VALID数据库照明或声音噪声。在本文中,我们描述了VALID数据库的获取和内容,该数据库由五个记录阶段组成,涵盖了106个主题,历时1个月。报告了使用从嘴巴区域提取的视觉语音特征进行的说话人识别实验。将基于不受控制的VALID数据库的性能与受控制的XM2VTS数据库的性能进行比较。基于VALID和XM2VTS的最佳准确性分别为63.21%和97.17%。这突出了不受控制的照明环境的不良影响,以及该数据库对于部署实际应用程序的重要性。 VALID数据库可通过http://ee.ucd.ie/validdb/向学术界使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号