首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression
【24h】

Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression

机译:使用双耳特征和局部线性回归的图像中音频源的共定位

获取原文
获取原文并翻译 | 示例

摘要

This paper addresses the problem of localizing audio sources using binaural measurements. We propose a supervised formulation that simultaneously localizes multiple sources at different locations. The approach is intrinsically efficient because, contrary to prior work, it relies neither on source separation, nor on monaural segregation. The method starts with a training stage that establishes a locally linear Gaussian regression model between the directional coordinates of all the sources and the auditory features extracted from binaural measurements. While fixed-length wide-spectrum sounds (white noise) are used for training to reliably estimate the model parameters, we show that the testing (localization) can be extended to variable-length sparse-spectrum sounds (such as speech), thus enabling a wide range of realistic applications. Indeed, we demonstrate that the method can be used for audio-visual fusion, namely to map speech signals onto images and hence to spatially align the audio and visual modalities, thus enabling to discriminate between speaking and non-speaking faces. We release a novel corpus of real-room recordings that allow quantitative evaluation of the co-localization method in the presence of one or two sound sources. Experiments demonstrate increased accuracy and speed relative to several state-of-the-art methods.
机译:本文解决了使用双耳测量来定位音频源的问题。我们提出了一种受监督的方案,该方案可以同时将多个源定位在不同的位置。该方法本质上是有效的,因为与先前的工作相反,它既不依赖于源分离,也不依赖于单声道隔离。该方法从训练阶段开始,该训练阶段在所有源的方向坐标与从双耳测量中提取的听觉特征之间建立局部线性高斯回归模型。虽然使用固定长度的广谱声音(白噪声)进行训练以可靠地估计模型参数,但我们表明测试(定位)可以扩展到可变长度的稀疏声音(例如语音),从而实现了广泛的实际应用。实际上,我们证明了该方法可用于视听融合,即将语音信号映射到图像上,从而在空间上对齐音频和视觉模态,从而能够区分说话和不说话的面孔。我们发布了一种新颖的真实录音室,可以在存在一个或两个声源的情况下对共定位方法进行定量评估。实验表明,相对于几种最先进的方法,准确性和速度都有所提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号