首页> 外文会议>Audio Engineering Society Convention >Content matching for sound generating objects within a visual scene using a computer vision approach
【24h】

Content matching for sound generating objects within a visual scene using a computer vision approach

机译:使用计算机视觉方法对视觉场景中的声音生成对象的内容匹配

获取原文

摘要

The increase in and demand for immersive audio content production and consumption, particularly in VR, is driving the need for tools to facilitate creation. Immersive productions place additional demands on sound design teams, specifically around the increased complexity of scenes, increased number of sound producing objects, and the need to spatialise sound in 360°. This paper presents an initial feasibility study for a methodology utilising visual object detection in order to detect, track, and match content for sound generating objects, in this case based on a simple 2D visual scene. Results show that while successful for a single moving object there are limitations within the current computer vision system used which causes complications for scenes with multiple objects. Results also show that the recommendation of candidate sound effect files is heavily dependent on the accuracy of the visual object detection system and the labelling of the audio repository used.
机译:沉浸式音频内容的增加和需求的增加和需求,特别是在VR中,推动了工具的需要,以便于创造。沉浸式制作对声音设计团队的额外需求,特别是围绕场景的复杂性,增加的声音数量增加,以及360°的空间声音。本文介绍了利用视觉对象检测的方法的初始可行性研究,以便检测,跟踪和匹配声音生成对象的内容,在这种情况下基于简单的2D视觉场景。结果表明,同时成功用于单个移动对象,在使用的当前计算机视觉系统中存在限制,这会导致具有多个对象的场景的并发症。结果还表明,候选声音效果文件的建议严重取决于视觉对象检测系统的准确性和所用音频存储库的标签。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号