Ava Active Speaker: An Audio-Visual Dataset for Active Speaker Detection

机译：Ava Active Speaker：用于有源扬声器检测的视听数据集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Active speaker detection is an important component in video analysis algorithms for applications such as speaker diarization, video re-targeting for meetings, speech enhancement, and human-robot interaction. The absence of a large, carefully labeled audio-visual active speaker dataset has limited evaluation in terms of data diversity, environments, and accuracy. In this paper, we present the AVA Active Speaker detection dataset (AVA-ActiveSpeaker) which has been publicly released to facilitate algorithm development and comparison. It contains temporally labeled face tracks in videos, where each face instance is labeled as speaking or not, and whether the speech is audible. The dataset contains about 3.65 million human labeled frames spanning 38.5 hours. We also introduce a state-of-the-art, jointly trained audio-visual model for real-time active speaker detection and compare several variants. The evaluation clearly demonstrates a significant gain due to audio-visual modeling and temporal integration over multiple frames.

机译：主动的说话人检测是视频分析算法中重要的组件，适用于诸如说话人区分，会议视频重定目标，语音增强和人机交互等应用。缺少大型，经过仔细标记的视听有源说话人数据集，在数据多样性，环境和准确性方面的评估有限。在本文中，我们介绍了AVA主动说话者检测数据集（AVA-ActiveSpeaker），该数据集已公开发布以促进算法开发和比较。它包含视频中带有时间标记的脸部轨迹，其中每个脸部实例都标记为正在说话或不讲话，以及语音是否可听。该数据集包含约365万个人类标记的帧，跨越38.5小时。我们还介绍了一种最新的，经过联合训练的视听模型，用于实时主动说话者检测，并比较了几种变体。评估清楚地表明，由于视听建模和多个框架上的时间整合，获得了显着收益。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2020年|4492-4496|共5页
会议地点
作者
Joseph Roth; Sourish Chaudhuri; Ondrej Klejch; Radhika Marvin; Andrew Gallagher; Liat Kaver; Sharadh Ramaswamy; Arkadiusz Stopczynski; Cordelia Schmid; Zhonghua Xi; Caroline Pantofaru;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
multimodal; audio-visual; active speaker detection; dataset;

机译：多模式视听主动说话人检测数据集;

相似文献

外文文献
中文文献
专利

1. Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially Aware Language Acquisition [J] . Stefanov Kalin, Beskow Jonas, Salvi Giampiero IEEE Transactions on Cognitive and Developmental Systems . 2020,第2期

机译：基于自我监督的愿景检测活动扬声器作为对社会意识语言采集的支持
2. Acoustic response of a curved active PVDF-paper/fabric speaker for active noise control of automotive interior noise [J] . Tilak Dias, Ravindra Monaragala, Manuchehr Soleimani Measurement Science & Technology . 2007,第5期

机译：弯曲有源PVDF纸/织物扬声器的声学响应，可有效控制汽车内部噪声
3. Easy-to-build Active Hifi Bookshelf Speakers Part 3:building the Optional Subwoofers [J] . Phil Prosser Silicon Chip . 2020,第3期

机译：易于构建的活跃的HIFI书架扬声器第3部分：构建可选的低音炮
4. Ava Active Speaker: An Audio-Visual Dataset for Active Speaker Detection [C] . Joseph Roth, Sourish Chaudhuri, Ondrej Klejch, IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：AVA Active Speaker：用于活动扬声器检测的视听数据集
5. Active learning with support vector machines for imbalanced datasets and a method for stopping active learning based on stabilizing predictions. [D] . Bloodgood, Michael. 2009

机译：支持向量机用于不平衡数据集的主动学习，以及一种基于稳定预测的主动学习停止方法。
6. Multimodal Speaker Diarization Using a Pre-Trained Audio-Visual Synchronization Model [O] . Rehan Ahmad, Syed Zubair, Hani Alquhayz, 2019

机译：使用预训练的视听同步模型进行多模态扬声器二分法
7. Active speaker detection with audio-visual co-training [O] . Chakravarty Jay, Zegers Jeroen, Tuytelaars Tinne, 2016

机译：主动说话者检测和视听协同训练

Ava Active Speaker: An Audio-Visual Dataset for Active Speaker Detection

摘要

著录项

相似文献

相关主题

期刊订阅