...
首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Semantic Fisher Scores for Task Transfer: Using Objects to Classify Scenes
【24h】

Semantic Fisher Scores for Task Transfer: Using Objects to Classify Scenes

机译:任务传输的语义fisher分数:使用对象来对场景进行分类

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The transfer of a neural network (CNN) trained to recognize objects to the task of scene classification is considered. A Bag-of-Semantics (BoS) representation is first induced, by feeding scene image patches to the object CNN, and representing the scene image by the ensuing bag of posterior class probability vectors (semantic posteriors). The encoding of the BoS with a Fisher vector (FV) is then studied. A link is established between the FVof any probabilistic model and theQ-function of the expectation-maximization (EM) algorithm used to estimate its parameters by maximumlikelihood. This enables 1) immediate derivation of FVs for any model for which an EM algorithm exists, and 2) leveraging efficient implementations from theEM literature for the computation of FVs. It is then shown that standard FVs, such as those derived from Gaussian or even Dirichlet mixtures, are unsuccessful for the transfer of semantic posteriors, due to the highly non-linear nature of the probability simplex. The analysis of these FVs shows that significant benefits can ensue by 1) designing FVs in the natural parameter space of the multinomial distribution, and 2) adopting sophisticated probabilistic models of semantic feature covariance. The combination of these two insights leads to the encoding of the BoS in the natural parameter space of the multinomial, using a vector of Fisher scores derived from a mixture of factor analyzers (MFA). A network implementation of the MFA Fisher Score (MFA-FS), denoted as the MFAFSNet, is finally proposed to enable end-to-end training. Experiments with various object CNNs and datasets show that the approach has state-of-the-art transfer performance. Somewhat surprisingly, the scene classification results are superior to those of a CNN explicitly trained for scene classification, using a large scene dataset (Places). This suggests that holistic analysis is insufficient for scene classification. The modeling of local object semantics appears to be at least equally important. The two approaches are also shown to be strongly complementary, leading to very large scene classification gains when combined, and outperforming all previous scene classification approaches by a sizable margin.
机译:考虑培训的神经网络(CNN)的传送以识别对象到场景分类的任务。首先通过将场景图像贴片馈送到对象CNN,并由随后的后级概率向量(语义后索)表示场景图像来引起语义袋式诱导的引起诱导的语义(BOS)表示。然后研究了具有Fisher载体(FV)的BOS的编码。在FVOF之间的任何概率模型和预期最大化(EM)算法之间建立了一个链接,用于通过MainirdureLikihion估计其参数的最大化(EM)算法。这使得能够实现1)EM算法存在的任何模型的FVS的直接推导,并且2)利用来自IEM文献的有效实现来计算FVS。然后表明,由于概率单纯性的高度非线性性质,因此,标准FVS,例如衍生自高斯或甚至Dirichlet混合物的那些,对于语义后部的转移是不成功的。对这些FV的分析表明,在多项分布的自然参数空间中,在多项分布的自然参数空间中设计FVS的显着效益,以及采用语义特征协方差的复杂概率模型。这两种见解的组合导致多项式的自然参数空间中的BOS的编码,使用来自因子分析仪(MFA)的混合物的Fisher分数的载体。最终提出了由MFAFSNET表示为MFAFSNET的MFA FISHER分数(MFA-FS)的网络实现,以实现端到端培训。各种对象CNN和数据集的实验表明,该方法具有最先进的转移性能。有些令人惊讶的是,场景分类结果优于使用大场景数据集(位置)明确培训的CNN。这表明整体分析对于场景分类不足。本地对象语义的建模似乎至少同样重要。两种方法也被认为是强烈的互补性,在组合时导致非常大的场景分类增益,并且通过可相同的边距表现出所有先前的场景分类方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号