首页> 外文期刊>International journal of imaging systems and technology >Audiovisual Localization of Multiple Speakers in a Video Teleconferencing Setting
【24h】

Audiovisual Localization of Multiple Speakers in a Video Teleconferencing Setting

机译:视频电话会议设置中多个发言人的视听本地化

获取原文
获取原文并翻译 | 示例
       

摘要

Attending to multiple speakers in a video teleconferencing setting is a complex task. From a visual point of view, multiple speakers can occur at different locations and present radically different appearances. From an audio point of view, multiple speakers may be speaking at the same time, and background noise may make it difficult to localize sound sources without some a priori estimate of the sound source locations. This article presents a novel sensor and corresponding sensing algorithms to address the task of attending, simultaneously, to multiple speakers for video teleconferencing. A panoramic visual sensor is used to capture a 360°view of the speakers in the environment and from this view potential speakers are identified via a color histogram approach. A directional audio system based on beamforming is then used to confirm potential speakers and attend to them. Experimental evaluation of the sensor and its algorithms are presented including sample performance of the entire system in a teleconferencing setting.
机译:参加视频电话会议设置中的多个扬声器是一项复杂的任务。从视觉的角度来看,多个扬声器可以出现在不同的位置,并且呈现出截然不同的外观。从音频的角度来看,可能有多个扬声器同时讲话,并且背景噪声可能使得在没有先验估计声源位置的情况下很难定位声源。本文提出了一种新颖的传感器和相应的感应算法,以解决同时参加多个发言人进行视频电话会议的任务。全景视觉传感器用于捕获环境中扬声器的360°视角,并通过颜色直方图方法从该视角识别潜在的扬声器。然后使用基于波束赋形的定向音频系统来确认潜在的讲话者并为他们讲话。提出了传感器及其算法的实验评估,包括在电话会议环境中整个系统的示例性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号