Visual to Sound: Generating Natural Sound for Videos in the Wild

机译：视觉到声音：为野外视频生成自然声音

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

As two of the five traditional human senses (sight, hearing, taste, smell, and touch), vision and sound are basic sources through which humans understand the world. Often correlated during natural events, these two modalities combine to jointly affect human perception. In this paper, we pose the task of generating sound given visual input. Such capabilities could help enable applications in virtual reality (generating sound for virtual scenes automatically) or provide additional accessibility to images or videos for people with visual impairments. As a first step in this direction, we apply learning-based methods to generate raw waveform samples given input video frames. We evaluate our models on a dataset of videos containing a variety of sounds (such as ambient sounds and sounds from people/animals). Our experiments show that the generated sounds are fairly realistic and have good temporal synchronization with the visual inputs.

机译：视觉和声音是人类五种传统感官中的两种（视觉，听觉，味觉，气味和触觉），是人类了解世界的基本资源。通常在自然事件中相关，这两种方式结合在一起共同影响人类的感知。在本文中，我们提出了在视觉输入下生成声音的任务。此类功能可以帮助启用虚拟现实中的应用程序（自动为虚拟场景生成声音），或为视障人士提供对图像或视频的其他可访问性。作为朝这个方向迈出的第一步，我们应用了基于学习的方法来生成给定输入视频帧的原始波形样本。我们在包含各种声音（例如环境声音和人/动物声音）的视频数据集上评估模型。我们的实验表明，所生成的声音相当逼真，并且与视觉输入具有良好的时间同步性。

著录项

来源
《IEEE/CVF Conference on Computer Vision and Pattern Recognition》|2018年|3550-3558|共9页
会议地点 Salt Lake City(US)
作者
Yipin Zhou; Zhaowen Wang; Chen Fang; Trung Bui; Tamara L. Berg;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Videos; Visualization; Task analysis; Generators; Solid modeling; Music; Gold;

机译：影片；可视化；任务分析；发电机；实体建模；音乐;金;

相似文献

外文文献
中文文献
专利

1. Visualizing Video Sounds With Sound Word Animation to Enrich User Experience [J] . Fangzhou Wang, Hidehisa Nagano, Kunio Kashino, Multimedia, IEEE Transactions on . 2017,第2期

机译：使用声音文字动画可视化视频声音，以丰富用户体验
2. RELATIONAL MEMORY IN INFANTS: 9-MONTH-OLD INFANTS CAN REACTIVATE VISUAL AREAS TO ASSOCIATED SOUNDS FOLLOWING SINGLE PRESENTATION OF SOUND-VIDEO PAIRINGS [J] . Walker John A., Low Kathy A., Cohen Neal J., Psychophysiology . 2016,第S1期

机译：婴儿的关系记忆：9个月大的婴儿可以在单一出现声音视频对后重新激活视觉区域以关联声音
3. RELATIONAL MEMORY IN INFANTS: 9-MONTH-OLD INFANTS CAN REACTIVATE VISUAL AREAS TO ASSOCIATED SOUNDS FOLLOWING SINGLE PRESENTATION OF SOUND-VIDEO PAIRINGS [J] . Walker John A., Low Kathy A., Cohen Neal J., Psychophysiology . 2016,第S1期

机译：婴儿的关系内存：9个月的婴幼儿可以在单一呈现出声视频配对后重新激活视觉区域到相关声音
4. Visual to Sound: Generating Natural Sound for Videos in the Wild [C] . Yipin Zhou, Zhaowen Wang, Chen Fang, IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2018

机译：视觉发出声音：为野外的视频产生自然声音
5. Raster scanning: A new approach to image sonification, sound visualization, sound analysis, and synthesis [D] . Yeo, Woon Seung 2008

机译：栅格扫描：图像超声处理，声音可视化，声音分析和合成的新方法
6. Wheezes crackles and rhonchi: simplifying description of lung sounds increases the agreement on their classification: a study of 12 physicians classification of lung sounds from video recordings [O] . Hasse Melbye, Luis Garcia-Marcos, Paul Brand, 2016

机译：喘息crack啪声和旋涡声：简化对肺音的描述会增加其分类的一致性：对12位医生从录像中对肺音进行分类的研究
7. Audio-Visual Model for Generating Eating Sounds Using Food ASMR Videos [O] . Kodai Uchiyama, Kazuhiko Kawamoto 2021

机译：使用食物ASMR视频产生饮食声音的视听模型
8. Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild. [R] . Thomason, J., Venugopalan, S., Guadarrama, S., 2014

机译：整合语言和视觉，生成自然语言对野外视频的描述。

Visual to Sound: Generating Natural Sound for Videos in the Wild

摘要

著录项

相似文献

相关主题

期刊订阅