...
首页> 外文期刊>Signal processing >Analysis of multimodal sequences using geometric video representations
【24h】

Analysis of multimodal sequences using geometric video representations

机译:使用几何视频表示法分析多峰序列

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

This paper presents a novel method to correlate audio and visual data generated by the same physical phenomenon, based on sparse geometric representation of video sequences. The video signal is modeled as a sum of geometric primitives evolving through time, that jointly describe the geometric and motion content of the scene. The displacement through time of relevant visual features, like the mouth of a speaker, can thus be compared with the evolution of an audio feature to assess the correspondence between acoustic and visual signals. Experiments show that the proposed approach allows to localize and track the speaker's mouth when several persons are present on the scene, in presence of distracting motion, and without prior face or mouth detection. (c) 2006 Elsevier B.V. All rights reserved.
机译:本文基于视频序列的稀疏几何表示,提出了一种将相同物理现象产生的音频和视频数据进行关联的新方法。视频信号被建模为随时间演变的几何图元的总和,它们共同描述了场景的几何和运动内容。因此,可以将相关视觉特征(如扬声器的嘴巴)随时间的位移与音频特征的演变进行比较,以评估声音和视觉信号之间的对应关系。实验表明,所提出的方法可以在场景中有几个人存在时,分散注意力的情况下并且无需事先进行面部或嘴部检测的情况下定位并跟踪说话者的嘴巴。 (c)2006 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号