See the Sound, Hear the Pixels

机译：看到声音，听见像素

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

For every event occurring in the real world, most often a sound is associated with the corresponding visual scene. Humans possess an inherent ability to automatically map the audio content with visual scenes leading to an effortless and enhanced understanding of the underlying event. This triggers an interesting question: Can this natural correspondence between video and audio, which has been diminutively explored so far, be learned by a machine and modeled jointly to localize the sound source in a visual scene? In this paper, we propose a novel algorithm that addresses the problem of localizing sound source in unconstrained videos, which uses efficient fusion and attention mechanisms. Two novel blocks namely, Audio Visual Fusion Block (AVFB) and Segment-Wise Attention Block (SWAB) have been developed for this purpose. Quantitative and qualitative evaluations show that it is feasible to use the same algorithm with minor modifications to serve the purpose of sound localization using three different types of learning: supervised, weakly supervised and unsupervised. A novel Audio Visual Triplet Gram Matrix Loss (AVTGML) has been proposed as a loss function to learn the localization in an unsupervised way. Our empirical evaluations demonstrate a significant increase in performance over the existing state-of-the-art methods, serving as a testimony to the superiority of our proposed approach.

机译：对于现实世界中发生的每个事件，声音通常与相应的视觉场景相关联。人类具有自动将音频内容映射到视觉场景的固有能力，从而可以毫不费力地增强对潜在事件的理解。这引发了一个有趣的问题：迄今为止，已经进行了少量研究的视频和音频之间的这种自然对应关系，是否可以由机器学习并联合建模以将声源定位在视觉场景中？在本文中，我们提出了一种新颖的算法，该算法使用有效的融合和注意力机制来解决在不受约束的视频中定位声源的问题。为此，已经开发了两个新颖的块，即视听融合块（AVFB）和分段明智的关注块（SWAB）。定量和定性评估表明，使用经过少量修改的相同算法来实现声音定位的目的是可行的，它使用三种不同类型的学习方法：有监督，弱监督和无监督。提出了一种新颖的视听三重语法矩阵损失（AVTGML）作为损失函数，以一种无监督的方式学习定位。我们的经验评估表明，与现有的最先进方法相比，该方法的性能有了显着提高，这证明了我们所提出方法的优越性。

著录项

来源
《IEEE Winter Conference on Applications of Computer Vision》|2020年|2959-2968|共10页
会议地点
作者
Janani Ramaswamy; Sukhendu Das;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Visualization; Feature extraction; Helicopters; Task analysis; Streaming media; Fuses; Gold;

机译：可视化;特征提取;直升机;任务分析;流媒体;保险丝;金;
入库时间 2022-08-26 14:36:34

相似文献

外文文献
中文文献
专利

1. Do You Hear What I Hear?Part II: Field Application of Sounding [J] . Christopher J. Luley, Mike Ellison Arborist News . 2018,第1期

机译：你听到我听到的吗？第二部分：声音的现场应用
2. Do You Hear What I Hear?Part I: Introduction to Sounding [J] . Christopher J. Luley, Mike Ellison Arborist News . 2017,第6期

机译：你听到了我听到的吗？第一部分：探析介绍
3. To Hear or Not to Hear: Sound Availability Modulates Sensory-Motor Integration [J] . Ivan Camponogara, Luca Turchet, Marco Carner, Frontiers in Neuroscience . 2016,第2016期

机译：听到还是不听到：声音的可用性调节了感官与运动的融合
4. Seeing the sound we hear: Optical technologies for visualizing sound wave [C] . Yasuhiro Oikawa, Kenji Ishikawa, Kohei Yatabe, SPIE Commercial + Scientific Sensing and Imaging Conference . 2018

机译：看到我们听到的声音：可视化声波的光学技术
5. Can you hear me?: Sound-capturing technology and the problem of obsolescence. [D] . Adkins, Heather Michelle. 2013

机译：您能听到我的声音吗：声音捕捉技术和过时的问题。
6. To Hear or Not to Hear: Sound Availability Modulates Sensory-Motor Integration [O] . Ivan Camponogara, Luca Turchet, Marco Carner, 2016

机译：听到或不听到：声音的可用性调节了感官与运动的融合
7. Box A2: Do You Hear What I Hear? Hearing and Sound in Animals [O] . Erik Miller‐Klein 2019

机译：Box A2：你听到了我听到的吗？动物中的听力和声音

See the Sound, Hear the Pixels

摘要

著录项

相似文献

相关主题

期刊订阅