首页> 外文会议>Conference on Applications and Science of Computational Intelligence Ⅳ Apr 17-18, 2001, Orlando, USA >M ultimodal fusion of polynomial classifiers for automatic person recognition
【24h】

M ultimodal fusion of polynomial classifiers for automatic person recognition

机译:多项式分类器的多模态融合用于自动人员识别

获取原文
获取原文并翻译 | 示例

摘要

With the prevalence of the information age, privacy and personalization are forefront in today's society. As such, biometrics are viewed as essential components of current and evolving technological systems. Consumers demand unobtrusive and non-invasive approaches. In our previous work, we have demonstrated a speaker verification system that meets these criteria. However, there are additional constraints for fielded systems. The required recognition transactions are often performed in adverse environments and across diverse populations, necessitating robust solutions. There are two significant problem areas in current generation speaker verification systems. The first is the difficulty in acquiring clean audio signals (in all environments) without encumbering the user with a head-mounted close-talking microphone. Second, unimodal biometric systems do not work with a significant percentage of the population. To combat these issues, multimodal techniques are being investigated to improve system robustness to environmental conditions, as well as improve overall accuracy across the population. We propose a multimodal approach that builds on our current state-of-the-art speaker verification technology. In order to maintain the transparent nature of the speech interface, we focus on optical sensing technology to provide the additional modality-giving us an audio-visual person recognition system. For the audio domain, we use our existing speaker verification system. For the visual domain, we focus on lip motion. This is chosen, rather than static face or iris recognition, because it provides dynamic information about the individual. In addition, the lip dynamics can aid speech recognition to provide liveness testing. The visual processing method makes use of both color and edge information, combined within a Markov random field (MRF) framework, to localize the lips. Geometric features are extracted and input to a polynomial classifier for the person recognition process. A late integration approach, based on a probabilistic model, is employed to combine the two modalities. The system is tested on the XM2VTS database combined with AWGN (in the audio domain) over a range of signal-to-noise ratios.
机译:随着信息时代的到来,隐私和个性化已成为当今社会的最前沿。因此,生物识别被视为当前和发展中的技术系统的基本组成部分。消费者要求使用非侵入性和非侵入性的方法。在我们以前的工作中,我们演示了符合这些条件的扬声器验证系统。但是,现场系统还有其他限制。所需的识别交易通常在不利的环境中和不同人群中进行,因此需要可靠的解决方案。当前一代的说话者验证系统有两个重要的问题领域。首先是难以获得清晰的音频信号(在所有环境中)而又不会给用户带来头戴式近距离麦克风的麻烦。其次,单峰生物识别系统无法在很大比例的人口中使用。为了解决这些问题,正在研究多峰技术,以提高系统对环境条件的鲁棒性,并提高整个人群的整体准确性。我们提出了一种多模式方法,该方法建立在我们当前最先进的扬声器验证技术的基础上。为了保持语音界面的透明性,我们专注于光学感测技术以提供其他模式,从而为我们提供了视听人员识别系统。对于音频领域,我们使用现有的扬声器验证系统。对于视觉领域,我们专注于嘴唇运动。选择此项而不是选择静态面部或虹膜识别是因为它提供有关个人的动态信息。另外,嘴唇动力学可以帮助语音识别以提供活力测试。视觉处理方法结合了颜色和边缘信息,并在马尔可夫随机场(MRF)框架内进行组合,以定位嘴唇。提取几何特征并将其输入到用于人识别过程的多项式分类器中。采用基于概率模型的后期集成方法来组合这两种模式。该系统已在XM2VTS数据库和AWGN(在音频域中)组合的一系列信噪比上进行了测试。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号