首页> 外文学位 >Computational models for binaural sound source localization and sound understanding.
【24h】

Computational models for binaural sound source localization and sound understanding.

机译:用于双耳声源定位和声音理解的计算模型。

获取原文
获取原文并翻译 | 示例

摘要

As one of humans' primary sensors, the auditory system plays an important role in language acquisition. Computational models for binaural sound source localization and sound source understanding are proposed in this thesis. The models build a fundamental auditory system for a mobile robot that will automatically learn language through multisensory inputs and interaction with the external environment. A hypothesis-driven approach is followed for the localization model. Using only binaural inputs, it enables three-dimensional (3D) localization by combining multiple cues. Two binaural localization cues, interaural time differences (ITDs) and interaural intensity differences (IIDs), and one monoaural localization cue, spectral cues, are extracted from the input sounds. A Bayes rule-based hierarchical framework is applied for decision making. Simulations show the effectiveness of the model. A robust ITD estimation algorithm is introduced and implemented on the robot. Satisfactory results are achieved under real-world environments. A multimodal learning scheme is proposed with the aid of vision to realize autonomous learning for the 3D binaural localization. No human instructors need to be involved. A generic model is presented for sound source understanding. No labelled training data is required to build the model. A histogram is employed as the sound representation, where the time-varying characteristics of sound can be preserved. Histogram intersection is used as the similarity measurement between different sounds. The model is successfully applied to content-based audio information retrieval and automatic audio indexing systems.
机译:作为人类的主要传感器之一,听觉系统在语言习得中起着重要作用。本文提出了双耳声源定位和声源理解的计算模型。这些模型为移动机器人构建了基本的听觉系统,该系统将通过多感官输入以及与外部环境的互动自动学习语言。本地化模型遵循假设驱动的方法。仅使用双耳输入,它可以通过组合多个提示来实现三维(3D)定位。从输入声音中提取两个双耳定位提示,即耳间时间差(ITD)和听觉强度差(IID),以及一个单耳定位提示,即频谱提示。基于贝叶斯规则的分层框架可用于决策。仿真表明了该模型的有效性。引入了鲁棒的ITD估计算法并在机器人上实现。在实际环境中取得令人满意的结果。提出了一种基于视觉的多模式学习方案,以实现3D双耳定位的自主学习。无需人工指导。提出了用于了解声源的通用模型。无需标记培训数据即可构建模型。使用直方图作为声音表示,可以保留声音的时变特性。直方图相交被用作不同声音之间的相似性度量。该模型已成功应用于基于内容的音频信息检索和自动音频索引系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号