首页> 外文会议>Multimodal Technologies for Perception of Humans; Lecture Notes in Computer Science; 4122 >An Audio-Visual Particle Filter for Speaker Tracking on the CLEAR'06 Evaluation Dataset
【24h】

An Audio-Visual Particle Filter for Speaker Tracking on the CLEAR'06 Evaluation Dataset

机译:用于CLEAR'06评估数据集上的扬声器跟踪的视听粒子滤波器

获取原文
获取原文并翻译 | 示例

摘要

We present an approach for tracking a lecturer during the course of his speech. We use features from multiple cameras and microphones, and process them in a joint particle filter framework. The filter performs sampled projections of 3D location hypotheses and scores them using features from both audio and video. On the video side, the features are based on foreground segmentation, multi-view face detection and upper body detection. On the audio side, the time delays of arrival between pairs of microphones are estimated with a generalized cross correlation function. In the CLEAR'06 evaluation, the system yielded a tracking accuracy (MOTA) of 71% for video-only, 55% for audio-only and 90% for combined audio-visual tracking.
机译:我们提供一种在讲师演讲过程中跟踪讲师的方法。我们使用来自多个摄像机和麦克风的功能,并在联合粒子过滤器框架中对其进行处理。过滤器执行3D位置假设的采样投影,并使用来自音频和视频的特征对它们进行评分。在视频方面,这些功能基于前景分割,多视图面部检测和上身检测。在音频方面,使用通用互相关函数估算麦克风对之间的到达时间延迟。在CLEAR'06评估中,该系统对纯视频的跟踪准确性(MOTA)为71%,对于纯音频的跟踪准确性(MOTA)为55%,对于组合视听跟踪,跟踪准确性为90%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号