首页> 外文会议>Annual conference on Neural Information Processing Systems >Object based Scene Representations using Fisher Scores of Local Subspace Projections
【24h】

Object based Scene Representations using Fisher Scores of Local Subspace Projections

机译:使用局部子空间投影的Fisher分数的基于对象的场景表示

获取原文

摘要

Several works have shown that deep CNNs can be easily transferred across datasets, e.g. the transfer from object recognition on ImageNet to object detection on Pascal VOC. Less clear, however, is the ability of CNNs to transfer knowledge across tasks. A common example of such transfer is the problem of scene classification, that should leverage localized object detections to recognize holistic visual concepts. While this problems is currently addressed with Fisher vector representations, these are now shown ineffective for the high-dimensional and highly non-linear features extracted by modern CNNs. It is argued that this is mostly due to the reliance on a model, the Gaussian mixture of diagonal covariances, which has a very limited ability to capture the second order statistics of CNN features. This problem is addressed by the adoption of a better model, the mixture of factor analyzers (MFA), which approximates the non-linear data manifold by a collection of local sub-spaces. The Fisher score with respect to the MFA (MFA-FS) is derived and proposed as an image representation for holistic image classifiers. Extensive experiments show that the MFA-FS has state of the art performance for object-to-scene transfer and this transfer actually outperforms the training of a scene CNN from a large scene dataset. The two representations are also shown to be complementary, in the sense that their combination outperforms each of the representations by itself. When combined, they produce a state-of-the-art scene classifier.
机译:几项工作表明,深层CNN可以轻松地跨数据集进行传输,例如从ImageNet上的对象识别到Pascal VOC上的对象检测的转换。然而,尚不清楚的是CNN跨任务传递知识的能力。这种转移的一个常见示例是场景分类问题,该问题应利用局部对象检测来识别整体视觉概念。虽然目前可以通过Fisher向量表示法解决此问题,但现在显示这些问题对于现代CNN提取的高维和高度非线性特征无效。有人认为这主要是由于对模型的依赖,即对角协方差的高斯混合,其捕获CNN特征的二阶统计量的能力非常有限。通过采用更好的模型,即因子分析器(MFA)的混合物来解决此问题,该模型通过局部子空间的集合来近似非线性数据流形。关于MFA(MFA-FS)的Fisher分数被推导并提出作为整体图像分类器的图像表示。大量的实验表明,MFA-FS具有对象到场景传输的最新性能,并且这种传输实际上胜过来自大型场景数据集的场景CNN的训练。从它们本身的组合胜过每个表示的意义上说,这两种表示也被证明是互补的。结合使用时,它们会产生最先进的场景分类器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号