Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings

Gatica-Perez D.; Lathoud G.; Odobez J.-M.; McCowan I.

首页> 外文期刊>IEEE transactions on audio, speech and language processing >Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings

【24h】

Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings

机译：会议中多个发言人的视听概率跟踪

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Tracking speakers in multiparty conversations constitutes a fundamental task for automatic meeting analysis. In this paper, we present a novel probabilistic approach to jointly track the location and speaking activity of multiple speakers in a multisensor meeting room, equipped with a small microphone array and multiple uncalibrated cameras. Our framework is based on a mixed-state dynamic graphical model defined on a multiperson state-space, which includes the explicit definition of a proximity-based interaction model. The model integrates audiovisual (AV) data through a novel observation model. Audio observations are derived from a source localization algorithm. Visual observations are based on models of the shape and spatial structure of human heads. Approximate inference in our model, needed given its complexity, is performed with a Markov Chain Monte Carlo particle filter (MCMC-PF), which results in high sampling efficiency. We present results-based on an objective evaluation procedure-that show that our framework 1) is capable of locating and tracking the position and speaking activity of multiple meeting participants engaged in real conversations with good accuracy, 2) can deal with cases of visual clutter and occlusion, and 3) significantly outperforms a traditional sampling-based approach

机译：在多方对话中跟踪发言人是自动会议分析的基本任务。在本文中，我们提出了一种新颖的概率方法，可在多传感器会议室（配备小型麦克风阵列和多个未校准的摄像头）中共同跟踪多个扬声器的位置和说话活动。我们的框架基于在多人状态空间上定义的混合状态动态图形模型，其中包括基于邻近性的交互模型的显式定义。该模型通过新颖的观察模型集成了视听（AV）数据。音频观测值是从源定位算法得出的。视觉观察基于人体头部的形状和空间结构的模型。考虑到模型的复杂性，需要使用马尔可夫链蒙特卡洛粒子滤波器（MCMC-PF）对模型进行近似推断，从而提高采样效率。我们基于客观的评估程序给出了结果，表明我们的框架1）能够准确定位和跟踪参与实际对话的多个会议参与者的位置和讲话活动，2）可以处理视觉混乱的情况和遮挡，以及3）明显优于传统的基于采样的方法

著录项

来源
《IEEE transactions on audio, speech and language processing》 |2007年第2期|p.601-616|共16页
作者
Gatica-Perez D.; Lathoud G.; Odobez J.-M.; McCowan I.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词
Markov processes; Monte Carlo methods; architectural acoustics; audio acoustics; audio signal processing; audio visual systems; face recognition; microphone arrays; particle filtering (numerical methods); speaker recognition; Markov chain Monte Carlo particle filt;

机译：马尔可夫过程;蒙特卡罗方法;建筑声学;音频声学;音频信号处理;视听系统;面部识别;麦克风阵列;粒子滤波（数值方法）;扬声器识别;马尔可夫链蒙特卡洛粒子滤波;

相似文献

外文文献
中文文献
专利

1. Multimodal (audiovisual) source separation exploiting multi-speaker tracking, robust beamforming and time-frequency masking [J] . Naqvi S.M., Wang W., Khan M.S., Signal Processing, IET . 2012,第5期

机译：利用多扬声器跟踪，强大的波束形成和时频掩蔽的多模式（视听）源分离
2. Tracking of multiple moving speakers with multiple microphone arrays [J] . Potamitis I., Huimin Chen, Tremoulis G. IEEE Transactions on Speech and Audio Proceessing . 2004,第5期

机译：跟踪带有多个麦克风阵列的多个移动扬声器
3. Searching for audiovisual correspondence in multiple speaker scenarios [J] . Agnès Alsius, Salvador Soto-Faraco Experimental Brain Research . 2011,第2a3期

机译：在多个说话者场景中搜索视听对应
4. Audiovisual speaker localization in medium smart meeting room [C] . Ronzhin A., Ronzhin A., Budkov V. Information, Communications and Signal Processing (ICICS) 2011 8th International Conference on . 2011

机译：中型智能会议室中的视听演讲者本地化
5. Probabilistic correspondence mapping for audiovisual speaker modeling [D] . Liu, Ming 2007

机译：视听说话人建模的概率对应映射
6. Audiovisual perceptual learning with multiple speakers [O] . Aaron D. Mitchel, Chip Gerfen, Daniel J. Weiss -1

机译：多个说话人的视听感知学习
7. Audiovisual probabilistic tracking of multiple speakers in meetings [O] . Daniel Gatica-perez, Jean-marc Odobez, Guillaume Lathoud, 2007

机译：会议中多个发言者的视听概率跟踪

Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings

摘要

著录项

相似文献

相关主题

期刊订阅