首页> 外文期刊>International Journal of Multimedia Information Retrieval >Video concept detection by audio-visual grouplets
【24h】

Video concept detection by audio-visual grouplets

机译:视听组检测视频概念

获取原文
获取原文并翻译 | 示例
       

摘要

We investigate general concept classification in unconstrained videos by joint audio-visual analysis. An audio-visual grouplet (AVG) representation is proposed based on analyzing the statistical temporal audio-visual interactions. Each AVG contains a set of audio and visual codewords that are grouped together according to their strong temporal correlations in videos, and the AVG carries unique audio-visual cues to represent the video content. By using the entire AVGs as building elements, video concepts can be more robustly classified than using traditional vocabularies with discrete audio or visual codewords. Specifically, we conduct coarse-level foreground/background separation in both audio and visual channels, and discover four types of AVGs by exploring mixed-and-matched temporal audiovisual correlations among the following factors: visual foreground, visual background, audio foreground, and audio background. All of these types of AVGs provide discriminative audio-visual patterns for classifying various semantic concepts. To effectively use the AVGs for improved concept classification, a distance metric learning algorithm is further developed. Based on the AVG structure, the algorithm uses an iterative quadratic programming formulation to learn the optimal distances between data points according to the large-margin nearest-neighbor setting. Various types of grouplet-based distances can be computed using individual AVGs, and through our distance metric learning algorithm these grouplet-based distances can be aggregated for final classification.We extensively evaluate our method overthe large-scale Columbia consumer video set. Experiments demonstrate that the AVG-based audio-visual representation can achieve consistent and significant performance improvements compared wth other state-of-the-art approaches.
机译:我们通过联合视听分析调查不受约束的视频中的一般概念分类。在分析统计时间视听交互作用的基础上,提出了视听小团体(AVG)表示。每个AVG包含一组音频和视觉代码字,它们根据它们在视频中的强烈时间相关性而被分组在一起,并且AVG带有独特的视听提示来表示视频内容。通过将整个AVG用作构建元素,与使用具有离散音频或视觉代码字的传统词汇相比,视频概念可以得到更可靠的分类。具体来说,我们在音频和视频通道中进行粗略的前景/背景分离,并通过探索以下因素之间的混合和匹配的时间视听关联来发现四种类型的AVG:视觉前景,视觉背景,音频前景和音频背景。所有这些类型的AVG提供用于区分各种语义概念的区分性视听模式。为了有效地将AVG用于改进的概念分类,进一步开发了距离度量学习算法。该算法基于AVG结构,使用迭代二次规划公式来根据大距离最近邻居设置学习数据点之间的最佳距离。可以使用单个AVG来计算各种类型的基于小组的距离,并且通过我们的距离度量学习算法,可以将这些基于小组的距离进行汇总以进行最终分类。我们在大型哥伦比亚消费者视频集上广泛评估了我们的方法。实验表明,与其他最新方法相比,基于AVG的视听表示可以实现一致且显着的性能改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号