首页> 外文期刊>Neurocomputing >Event Bank based multimedia representation via latent group logistic regression minimization
【24h】

Event Bank based multimedia representation via latent group logistic regression minimization

机译:通过潜在组逻辑回归最小化基于事件库的多媒体表示

获取原文
获取原文并翻译 | 示例
           

摘要

In order to perform multimedia event detection (MED) tasks in uncontrolled videos, a very large number of labeled videos are required for training the event classifier, which would become quite challenging especially when there are lots of events. Because an event involves usually several spatial temporal objects, one intuitive solution is to model those objects from a large number of labeled images which can be obtained very easily from standard image datasets, such as the ImageNet challenge dataset, and to model their spatial temporal relationships from a relatively small number of labeled videos which can be also obtained very easily from standard video datasets, such as the TRECVID MED 2012 dataset. In this paper, we propose accordingly a latent group logistic regression (latent GLR) mixture model for those objects and an event bank descriptor for their spatial temporal relationships. Furthermore, we develop an efficient iterative training algorithm to learn model parameters of the individual latent GLR mixture model, which combines the coordinate descent approach and the gradient descent approach to minimize the l(2,1)-norm or group regularized logistic loss function. We also conduct extensive experiments to evaluate the object detection performance by using the latent GLR mixture model on the ImageNet challenge dataset and the event detection performance by using the event bank descriptor on the TRECVID MED 2012 dataseL The results demonstrate the effectiveness of both proposed approaches. (C) 2015 Elsevier B.V. All rights reserved.
机译:为了在不受控制的视频中执行多媒体事件检测(MED)任务,训练事件分类器需要大量带标签的视频,这将变得非常具有挑战性,尤其是在有很多事件的情况下。由于事件通常涉及多个空间时空对象,因此一种直观的解决方案是根据大量带标签的图像对这些对象进行建模,这些图像可以很容易地从标准图像数据集(例如ImageNet挑战数据集)获得,并对其空间时空关系进行建模从相对较少数量的带标签的视频中提取,也可以很容易地从标准视频数据集(例如TRECVID MED 2012数据集)中获得。在本文中,我们相应地为这些对象提出了一个潜在群体逻辑回归(潜在GLR)混合模型,并为它们的时空关系提供了一个事件库描述符。此外,我们开发了一种有效的迭代训练算法来学习单个潜在GLR混合模型的模型参数,该模型结合了坐标下降法和梯度下降法以最小化l(2,1)-范数或群正则逻辑损失函数。我们还进行了广泛的实验,以在ImageNet质询数据集上使用潜在GLR混合模型评估对象检测性能,并在TRECVID MED 2012数据集上使用事件库描述符对事件检测性能进行了评估。结果证明了这两种方法的有效性。 (C)2015 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号