Audiovisual Localization of Multiple Speakers in a Video Teleconferencing Setting

Bill Kapralos; Michael R. M. Jenkin; Evangelos Milios

首页> 外文期刊>International journal of imaging systems and technology >Audiovisual Localization of Multiple Speakers in a Video Teleconferencing Setting

【24h】

Audiovisual Localization of Multiple Speakers in a Video Teleconferencing Setting

机译：视频电话会议设置中多个发言人的视听本地化

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Attending to multiple speakers in a video teleconferencing setting is a complex task. From a visual point of view, multiple speakers can occur at different locations and present radically different appearances. From an audio point of view, multiple speakers may be speaking at the same time, and background noise may make it difficult to localize sound sources without some a priori estimate of the sound source locations. This article presents a novel sensor and corresponding sensing algorithms to address the task of attending, simultaneously, to multiple speakers for video teleconferencing. A panoramic visual sensor is used to capture a 360°view of the speakers in the environment and from this view potential speakers are identified via a color histogram approach. A directional audio system based on beamforming is then used to confirm potential speakers and attend to them. Experimental evaluation of the sensor and its algorithms are presented including sample performance of the entire system in a teleconferencing setting.

机译：参加视频电话会议设置中的多个扬声器是一项复杂的任务。从视觉的角度来看，多个扬声器可以出现在不同的位置，并且呈现出截然不同的外观。从音频的角度来看，可能有多个扬声器同时讲话，并且背景噪声可能使得在没有先验估计声源位置的情况下很难定位声源。本文提出了一种新颖的传感器和相应的感应算法，以解决同时参加多个发言人进行视频电话会议的任务。全景视觉传感器用于捕获环境中扬声器的360°视角，并通过颜色直方图方法从该视角识别潜在的扬声器。然后使用基于波束赋形的定向音频系统来确认潜在的讲话者并为他们讲话。提出了传感器及其算法的实验评估，包括在电话会议环境中整个系统的示例性能。

著录项

来源
《International journal of imaging systems and technology》 |2003年第1期|p.95-105|共11页
作者
Bill Kapralos; Michael R. M. Jenkin; Evangelos Milios;
展开▼
作者单位

Department of Computer Science, York University, Toronto, Ontario, Canada;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类摄影技术;
关键词
teleconferencing; active vision sensor; face detection system;

机译：电话会议;主动视觉传感器;面部检测系统;

相似文献

外文文献
中文文献
专利

1. Voice activity detection and speaker localization using audiovisual cues [J] . Dante A. Blauth, Vicente P. Minotto, Claudio R. Jung, Pattern recognition letters . 2012,第4期

机译：使用视听提示进行语音活动检测和说话人定位
2. Fundamental bounds and approximations for ATM multiplexers with applications to video teleconferencing [J] . Elwalid A., Heyman D. IEEE Journal on Selected Areas in Communications . 1995,第6期

机译：ATM多路复用器在视频电话会议中的应用的基本界限和近似值
3. Searching for audiovisual correspondence in multiple speaker scenarios [J] . Agnès Alsius, Salvador Soto-Faraco Experimental Brain Research . 2011,第2a3期

机译：在多个说话者场景中搜索视听对应
4. Development of opinion-based audiovisual quality models for desktop video-teleconferencing [C] . Jones, C., Atkinson, . 1998

机译：为桌面视频会议开发基于意见的视听质量模型
5. Adaptive wireless video streaming and teleconferencing. [D] . Chen, Wei. 2017

机译：自适应无线视频流和电话会议。
6. Audiovisual perceptual learning with multiple speakers [O] . Aaron D. Mitchel, Chip Gerfen, Daniel J. Weiss -1

机译：多个说话人的视听感知学习
7. Audio-Visual Localization of Multiple Speakers in a Video Teleconferencing Setting [O] . Bill Kapralos, Michael R. M. Jenkin, Evangelos Milios 2002

机译：视频电话会议设置中的多个扬声器的视听本地化

Audiovisual Localization of Multiple Speakers in a Video Teleconferencing Setting

摘要

著录项

相似文献

相关主题

期刊订阅