首页> 外文期刊>NTT Technical Review >Media Scene Learning: A Novel Framework for Automatically Extracting Meaningful Parts from Audio and Video Signals
【24h】

Media Scene Learning: A Novel Framework for Automatically Extracting Meaningful Parts from Audio and Video Signals

机译:媒体场景学习:一种新颖的框架,可从音频和视频信号中自动提取有意义的部分

获取原文
       

摘要

We describe a novel framework called Media Scene Learning (MSL) for automatically extracting key components such as the sound of a single instrument from a given audio signal or a target object from a given video signal. In particular, we introduce two key methods: 1) the Composite Auto-Regressive System (CARS) for decomposing audio signals into several sound components on the basis of a generative model of sounds and 2) Saliency-Based Image Learning (SBIL) for extracting object-like regions from a given video signal on the basis of the characteristics of the human visual system.
机译:我们描述了一种称为媒体场景学习(MSL)的新颖框架,该框架可从给定的音频信号中自动提取关键组件,例如单个乐器的声音,或者从给定的视频信号中提取目标对象。特别是,我们介绍了两种关键方法:1)基于声音生成模型将音频信号分解为几个声音成分的复合自回归系统(CARS),以及2)基于显着性的图像学习(SBIL)提取根据人类视觉系统的特性,从给定视频信号中提取出类似对象的区域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号