首页> 外文期刊>Signal processing >Multimodal speaker/speech recognition using lip motion, lip texture and audio
【24h】

Multimodal speaker/speech recognition using lip motion, lip texture and audio

机译:使用嘴唇运动,嘴唇纹理和音频的多模式说话者/语音识别

获取原文
获取原文并翻译 | 示例

摘要

We present a new multimodal speaker/speech recognition system that integrates audio, lip texture and lip motion modalities. Fusion of audio and face texture modalities has been investigated in the literature before. The emphasis of this work is to investigate the benefits of inclusion of lip motion modality for two distinct cases: speaker and speech recognition. The audio modality is represented by the well-known mel-frequency cepstral coefficients (MFCC) along with the first and second derivatives, whereas lip texture modality is represented by the 2D-DCT coefficients of the luminance component within a bounding box about the lip region. In this paper, we employ a new lip motion modality representation based on discriminative analysis of the dense motion vectors within the same bounding box for speaker/speech recognition. The fusion of audio, lip texture and lip motion modalities is performed by the so-called reliability weighted summation (RWS) decision rule. Experimental results show that inclusion of lip motion modality provides further performance gains over those which are obtained by fusion of audio and lip texture alone, in both speaker identification and isolated word recognition scenarios. (c) 2006 Published by Elsevier B.V.
机译:我们提出了一种新的多模式说话者/语音识别系统,该系统集成了音频,嘴唇纹理和嘴唇运动形态。以前在文献中已经研究了音频和面部纹理模态的融合。这项工作的重点是研究在两种不同的情况下包括嘴唇运动形态的好处:说话者和语音识别。音频模态由众所周知的梅尔频率倒谱系数(MFCC)以及一阶和二阶导数表示,而嘴唇纹理模态由围绕嘴唇区域的边界框中的亮度分量的2D-DCT系数表示。在本文中,我们基于对同一边界框内密集运动矢量的判别分析,采用了一种新的唇动模态表示,用于说话人/语音识别。音频,嘴唇纹理和嘴唇运动形态的融合是通过所谓的可靠性加权求和(RWS)决策规则执行的。实验结果表明,在说话人识别和孤立的单词识别场景中,与仅通过音频和嘴唇纹理的融合获得的嘴唇运动形态相比,嘴唇运动形态的包含性更高。 (c)2006年由Elsevier B.V.发布

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号