...
【24h】

Evaluation of active learning strategies for video indexing

机译:视频索引主动学习策略的评估

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

In this paper, we compare active learning strategies for indexing concepts in video shots. Active learning is simulated using subsets of a fully annotated data set instead of actually calling for user intervention. Training is done using the collaborative annotation of 39 concepts of the TRECVID 2005 campaign. Performance is measured on the 20 concepts selected for the TRECVID 2006 concept detection task. The simulation allows exploring the effect of several parameters: the strategy, the annotated fraction of the data set, the size of the data set, the number of iterations and the relative difficulty of concepts. Three strategies were compared. The first two, respectively, select the most probable and the most uncertain samples. The third one is a random choice. For easy concepts, the "most probable" strategy is the best one when less than 15% of the data set is annotated and the "most uncertain" strategy is the best one when 15% or more of the data set is annotated. The "most probable" and "most uncertain" strategies are roughly equivalent for moderately difficult and difficult concepts. In all cases, the maximum performance is reached when 12-15% of the whole data set is annotated. This result is, however, dependent upon the step size and the training set size. One-fortieth of the training set size is a good value for the step size. The size of the subset of the training set that has to be annotated in order to reach the maximum achievable performance varies with the square root of the training set size. The "most probable" strategy is more "recall oriented" and the "most uncertain" strategy is more "precision oriented". (c) 2007 Elsevier B.V. All rights reserved.
机译:在本文中,我们比较了主动学习策略以索引视频镜头中的概念。使用完全注释数据集的子集模拟主动学习,而不是实际要求用户干预。使用TRECVID 2005活动的39个概念的协作注释来完成培训。性能是针对TRECVID 2006概念检测任务选择的20个概念进行评估的。通过仿真可以探索几个参数的效果:策略,数据集的带注释部分,数据集的大小,迭代次数和概念的相对难度。比较了三种策略。前两个分别选择最可能和最不确定的样本。第三个是随机选择。对于简单的概念,当注释少于15%的数据集时,“最可能”策略是最佳方法,而注释15%或更多的数据集时,“最不确定”策略是最佳方法。对于中等难度和困难的概念,“最可能”和“最不确定”策略大致等效。在所有情况下,注释整个数据集的12-15%即可达到最高性能。但是,此结果取决于步长和训练集大小。训练集大小的四十分之一是步长的好值。为了达到最大可实现的性能而必须注释的训练集子集的大小随训练集大小的平方根而变化。 “最可能”策略更“面向召回”,“最不确定”策略更“面向精确”。 (c)2007 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号