首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition >Visual to Sound: Generating Natural Sound for Videos in the Wild
【24h】

Visual to Sound: Generating Natural Sound for Videos in the Wild

机译:视觉到声音:为野外视频生成自然声音

获取原文

摘要

As two of the five traditional human senses (sight, hearing, taste, smell, and touch), vision and sound are basic sources through which humans understand the world. Often correlated during natural events, these two modalities combine to jointly affect human perception. In this paper, we pose the task of generating sound given visual input. Such capabilities could help enable applications in virtual reality (generating sound for virtual scenes automatically) or provide additional accessibility to images or videos for people with visual impairments. As a first step in this direction, we apply learning-based methods to generate raw waveform samples given input video frames. We evaluate our models on a dataset of videos containing a variety of sounds (such as ambient sounds and sounds from people/animals). Our experiments show that the generated sounds are fairly realistic and have good temporal synchronization with the visual inputs.
机译:视觉和声音是人类五种传统感官中的两种(视觉,听觉,味觉,气味和触觉),是人类了解世界的基本资源。通常在自然事件中相关,这两种方式结合在一起共同影响人类的感知。在本文中,我们提出了在视觉输入下生成声音的任务。此类功能可以帮助启用虚拟现实中的应用程序(自动为虚拟场景生成声音),或为视障人士提供对图像或视频的其他可访问性。作为朝这个方向迈出的第一步,我们应用了基于学习的方法来生成给定输入视频帧的原始波形样本。我们在包含各种声音(例如环境声音和人/动物声音)的视频数据集上评估模型。我们的实验表明,所生成的声音相当逼真,并且与视觉输入具有良好的时间同步性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号