首页> 外文学位 >Audio localization in The Automatic Cameraman.
【24h】

Audio localization in The Automatic Cameraman.

机译:自动摄影师中的音频本地化。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation studies the audio localization component of a touchless interactive display located in the CSE building at UC San Diego. The display has been named The Automatic Cameraman (TAC) and consists of four large television displays, a PTZ camera, and a microphone array. In this work, we propose a simple solution to the problem of accurately pointing the PTZ camera at speaking humans who are interacting with TAC.The focus of this dissertation will be on a novel audio localization and tracking algorithm based on what we call the coordinate-free approach. Previous approaches to localization assume a precise known geometry for the microphone array. This is expressed through a coordinate system for the room with an exact position for each microphone element. As a result, arrays are typically built so that microphone positions can be known easily e.g. as linear or planar with fixed spacing. The coordinate-free method we propose requires no such knowledge of such a coordinate system allowing for an ad-hoc placement of microphones.Our coordinate-free localization algorithm employs a statistical approach by learning a mapping from observed time-delays between microphone pairs directly to a pan and tilt directive for the PTZ-camera. In addition, we explicitly utilize the fact that the training set of time-delay vectors lie on a low-dimensional structure, namely a three-dimensional structure governed by the sound source's true spatial location. We explore various regressor models with special attention to those that are known to exploit this intrinsic low dimensionality.We follow this with a study of a particle filtering based tracker of the time-delays between microphones. Our tracker employs a novel approach to the particle filtering problem based on online learning. It introduces a new, practically useful, particle resampling scheme. It is also more robust to model misspecification than traditional particle filters.In the final part of the dissertation, we examine a MEMS digital microphone based array that we recently implemented on an FPGA. We explore how this digital array will alleviate many of the technical deficiencies of the current analog array in TAC.
机译:本文研究了位于圣地亚哥圣地亚哥的CSE大楼中的非接触式交互式显示器的音频本地化组件。该显示器已命名为“自动摄影师(TAC)”,由四个大型电视显示器,一个PTZ摄像机和一个麦克风阵列组成。在这项工作中,我们提出了一个简单的解决方案,以解决将PTZ摄像机准确对准与TAC交互的说话人的问题。本文的重点是基于一种称为坐标的新颖音频定位和跟踪算法,免费的方法。先前的定位方法采用麦克风阵列的精确已知几何形状。这是通过房间的坐标系表示的,每个麦克风元素的位置都正确。结果,通常构建阵列,使得可以容易地知道麦克风位置,例如,麦克风位置。固定间隔的线性或平面形状。我们提出的无坐标方法不需要这样的坐标系统知识,因此可以临时放置麦克风。我们的无坐标定位算法采用统计方法,通过从观察到的麦克风对之间的时间延迟直接映射到PTZ摄像机的水平和垂直指令。此外,我们明确地利用了以下事实:时间向量的训练集位于低维结构上,即由声源的真实空间位置控制的三维结构。我们探索各种回归模型,特别关注那些利用这种固有的低维度的回归模型。在此之后,我们对基于粒子滤波的麦克风之间时间延迟跟踪器进行了研究。我们的跟踪器基于在线学习,采用了一种新颖的方法来解决粒子过滤问题。它介绍了一种新的,实用的粒子重采样方案。与传统的粒子滤波器相比,它能更好地对误分类进行建模。在本文的最后,我们研究了最近在FPGA上实现的基于MEMS数字麦克风的阵列。我们探索这种数字阵列如何缓解TAC中当前模拟阵列的许多技术缺陷。

著录项

  • 作者

    Ettinger, Evan Ira.;

  • 作者单位

    University of California, San Diego.;

  • 授予单位 University of California, San Diego.;
  • 学科 Statistics.Artificial Intelligence.Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 105 p.
  • 总页数 105
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号