Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain

机译：视听扬声器本地化的数据融合：将动态流权重扩展到空间域

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Estimating the positions of multiple speakers can be helpful for tasks like automatic speech recognition or speaker diarization. Both applications benefit from a known speaker position when, for instance, applying beamforming or assigning unique speaker identities. Recently, several approaches utilizing acoustic signals augmented with visual data have been proposed for this task. However, both the acoustic and the visual modality may be corrupted in specific spatial regions, for instance due to poor lighting conditions or to the presence of background noise. This paper proposes a novel audiovisual data fusion framework for speaker localization by assigning individual dynamic stream weights to specific regions in the localization space. This fusion is achieved via a neural network, which combines the predictions of individual audio and video trackers based on their time- and location-dependent reliability. A performance evaluation using audiovisual recordings yields promising results, with the proposed fusion approach outperforming all baseline models.

机译：估算多个扬声器的位置可以有助于自动语音识别或扬声器日益增估等任务。例如，当应用波束成形或分配唯一的扬声器标识时，这两个应用程序都受益于已知的扬声器位置。最近，已经提出了为此任务提出了利用使用视觉数据增强的声学信号的几种方法。然而，声学和视觉模态都可以在特定的空间区域损坏，例如由于照明条件不佳或存在背景噪声。本文通过将单独的动态流权重分配给本地化空间中的特定区域来提出了一种新的视听数据融合框架。通过神经网络实现该融合，该网络基于它们的时间和位置相关的可靠性结合各个音频和视频跟踪器的预测。使用视听记录的性能评估产生了有希望的结果，提出的融合方法优于所有基线模型。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2021年|4705-4709|共5页
会议地点
作者
Julio Wissing; Benedikt Boenninghoff; Dorothea Kolossa; Tsubasa Ochiai; Marc Delcroix; Keisuke Kinoshita; Tomohiro Nakatani; Shoko Araki; Christopher Schymura;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Location awareness; Performance evaluation; Visualization; Data integration; Acoustics; Reliability; Noise measurement;

机译：位置意识;性能评估;可视化;数据集成;声学;可靠性;噪声测量;

相似文献

外文文献
中文文献
专利

1. E-cadherin cytoplasmic domain inhibits cell surface localization of endogenous cadherins and fusion of C2C12 myoblasts E-cadherin cytoplasmic domain inhibits cell surface localization of endogenous cadherins and fusion of C2C12 myoblasts E-cadherin cytoplasmic domain inhibits cell surface localization of endogenous cadherins and fusion of C2C12 myoblasts [J] . Masayuki Ozawa Biology Open . 2015,第11期

机译：E-钙粘蛋白胞质结构域抑制内源钙粘蛋白的细胞表面定位和C2C12成肌细胞融合E-钙粘蛋白胞质结构域抑制内源钙粘蛋白的细胞表面定位和C2C12成肌细胞融合E-钙粘蛋白胞质域抑制内源性钙粘蛋白的细胞表面定位和融合成肌细胞
2. E-cadherin cytoplasmic domain inhibits cell surface localization of endogenous cadherins and fusion of C2C12 myoblasts E-cadherin cytoplasmic domain inhibits cell surface localization of endogenous cadherins and fusion of C2C12 myoblasts E-cadherin cytoplasmic domain inhibits cell surface localization of endogenous cadherins and fusion of C2C12 myoblasts [J] . Masayuki Ozawa Biology Open . 2015,第11期

机译：E-钙粘蛋白胞质结构域抑制内源钙粘蛋白的细胞表面定位和C2C12成肌细胞融合E-钙粘蛋白胞质结构域抑制内源钙粘蛋白的细胞表面定位和C2C12成肌细胞融合E-钙粘蛋白胞质域抑制内源性钙粘蛋白的细胞表面定位和融合成肌细胞
3. Voice activity detection and speaker localization using audiovisual cues [J] . Dante A. Blauth, Vicente P. Minotto, Claudio R. Jung, Pattern recognition letters . 2012,第4期

机译：使用视听提示进行语音活动检测和说话人定位
4. Extending Linear Dynamical Systems with Dynamic Stream Weights for Audiovisual Speaker Localization [C] . Christopher Schymura, Tobias Isenberg, Dorothea Kolossa International Workshop on Acoustic Signal Enhancement . 2018

机译：扩展具有动态流权重的线性动力系统以进行视听扬声器定位
5. Extended spatial domain and hybrid time-domain analyses for microwave circuits. [D] . Koh, Dongsoo. 1997

机译：微波电路的扩展空间域和混合时域分析。
6. IoT-Stream: A Lightweight Ontology for Internet of Things Data Streams and Its Use with Data Analytics and Event Detection Services [O] . Tarek Elsaleh, Shirin Enshaeifar, Roonak Rezvani, 2020

机译：IoT-Stream：物联网数据流的轻量级本体及其与数据分析和事件检测服务的结合
7. Information fusion and decision cascading for audiovisual speaker recognition based on time-varying stream reliability prediction [O] . Upendra V. Chaudhari, Ganesh N. Ramaswamy, Gerasimos Potamianos, 2003

机译：基于时变流可靠性预测的视听说话人识别信息融合与决策级联

Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅