首页> 外文期刊>Multimedia Tools and Applications >Multimodal extraction of events and of information about the recording activity in user generated videos
【24h】

Multimodal extraction of events and of information about the recording activity in user generated videos

机译:对用户生成的视频中的事件和有关录制活动的信息进行多模式提取

获取原文
获取原文并翻译 | 示例
           

摘要

In this work we propose methods that exploit context sensor data modalities for the task of detecting interesting events and extracting high-level contextual information about the recording activity in user generated videos. Indeed, most camera-enabled electronic devices contain various auxiliary sensors such as accelerometers, compasses, GPS receivers, etc. Data captured by these sensors during the media acquisition have already been used to limit camera degradations such as shake and also to provide some basic tagging information such as the location. However, exploiting the sensor-recordings modality for subsequent higher-level information extraction such as interesting events has been a subject of rather limited research, further constrained to specialized acquisition setups. In this work, we show how these sensor modalities allow inferring information (camera movements, content degradations) about each individual video recording. In addition, we consider a multi-camera scenario, where multiple user generated recordings of a common scene (e.g., music concerts) are available. For this kind of scenarios we jointly analyze these multiple video recordings and their associated sensor modalities in order to extract higher-level semantics of the recorded media: based on the orientation of cameras we identify the region of interest of the recorded scene, by exploiting correlation in the motion of different cameras we detect generic interesting events and estimate their relative position. Furthermore, by analyzing also the audio content captured by multiple users we detect more specific interesting events. We show that the proposed multimodal analysis methods perform well on various recordings obtained in real live music performances.
机译:在这项工作中,我们提出了利用上下文传感器数据模式来检测有趣事件并提取有关用户生成的视频中的录制活动的高级上下文信息的方法。实际上,大多数支持照相机的电子设备都包含各种辅助传感器,例如加速度计,指南针,GPS接收器等。这些传感器在媒体获取过程中捕获的数据已用于限制照相机的退化(例如抖动)并提供一些基本标签位置等信息。然而,将传感器记录模式用于随后的更高级别的信息提取(例如有趣的事件)一直是相当有限的研究主题,进一步受限于专门的采集设置。在这项工作中,我们展示了这些传感器模式如何允许推断有关每个单独视频记录的信息(相机运动,内容降级)。另外,我们考虑了多摄像机场景,其中有多个用户生成的共同场景的记录(例如,音乐会)。在这种情况下,我们联合分析这些多个视频记录及其关联的传感器模式,以提取记录媒体的高级语义:基于摄像机的方向,我们通过利用相关性来识别记录场景的感兴趣区域在不同摄像机的运动中,我们检测到一般的有趣事件并估计它们的相对位置。此外,通过分析多个用户捕获的音频内容,我们可以检测到更具体的有趣事件。我们表明,所提出的多峰分析方法在真实现场音乐表演中获得的各种录音效果良好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号