首页> 外文期刊>IEEE Transactions on Circuits and Systems for Video Technology >Fusing Object Semantics and Deep Appearance Features for Scene Recognition
【24h】

Fusing Object Semantics and Deep Appearance Features for Scene Recognition

机译:融合对象语义和场景识别的深度外观特征

获取原文
获取原文并翻译 | 示例

摘要

Scene images generally show the characteristics of large intra-class variety and high inter-class similarity because of complicated appearances, subtle differences, and ambiguous categorization. Hence, it is difficult to achieve satisfactory accuracy by using a single representation. For solving this issue, we present a comprehensive representation for scene recognition by fusing deep features extracted from three discriminative views, including the information of object semantics, global appearance, and contextual appearance. These views show diversity and complementarity of features. The object semantics representation of the scene image, denoted by spatial-layout-maintained object semantics features, is extracted from the output of a deep-learning-based multi-classes detector by using spatial fisher vectors, which can simultaneously encode the category and layout information of objects. A multi-direction long short-term memory-based model is built to represent contextual information of the scene image, and the activation of the fully connected layer of a convolutional neural network is used to represent the global appearance of scene image. These three kinds of deep features are then fused to draw a final conclusion for scene recognition. Extensive experiments are conducted to evaluate the proposed comprehensive representation on three benchmarks scene image database. The results show that the three deep features complement to each other strongly and are effective in improving recognition performance after fusion. The proposed method can achieve scene recognition accuracy of 89.51% on the MIT67 database, 78.93% on the SUN397 database, and 57.27% on the Places365 databases, respectively, which are better percentages than the accuracies obtained by the latest reported deep-learning-based scene recognition methods.
机译:场景图像通常显示出大型种类的大多数和高级级别相似性​​的特点,因为复杂的外观,微妙的差异和模糊分类。因此,通过使用单个表示难以实现令人满意的精度。为了解决这个问题,我们通过融合从三个歧视性视图中提取的深度特征来提供场景识别的全面表示,包括对象语义,全局外观和上下文外观的信息。这些视图显示了功能的多样性和互补性。由Spatial-Play-Play-Lave-Porthed对象语义特征表示的场景图像的对象语义表示,通过使用空间Fisher向量从基于深度学习的多类检测器的输出中提取,这可以同时对类别进行编码和布局对象的信息。基于多向短期内存的模型来构建以表示场景图像的上下文信息,并且卷积神经网络的完全连接层的激活用于表示场景图像的全局外观。然后融合这三种深度特征,以利用场景识别的最终结论。进行广泛的实验,以评估三个基准场景图像数据库的提出的综合代表。结果表明,三个深度的功能强烈相互补充,在融合后改善识别性能有效。所提出的方法可以在MIT67数据库上实现89.51%的场景识别准确度,Sun397数据库上的78.93%,分别为57.27%,分别为57.27%的数据库,这比最新报告的深度学习获得的准确性更好的百分比场景识别方法。

著录项

  • 来源
  • 作者单位

    Nanjing Univ Posts & Telecommun Engn Res Ctr Wideband Wireless Commun Technol Minist Educ Nanjing 210003 Jiangsu Peoples R China;

    Nanjing Univ Posts & Telecommun Coll Commun & Informat Engn Nanjing 210003 Jiangsu Peoples R China;

    Nanjing Univ Posts & Telecommun Engn Res Ctr Wideband Wireless Commun Technol Minist Educ Nanjing 210003 Jiangsu Peoples R China;

    Nanjing Univ Posts & Telecommun Engn Res Ctr Wideband Wireless Commun Technol Minist Educ Nanjing 210003 Jiangsu Peoples R China;

    Nanjing Univ Posts & Telecommun Engn Res Ctr Wideband Wireless Commun Technol Minist Educ Nanjing 210003 Jiangsu Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Comprehensive representation; contextual feature; object semantics; scene recognition;

    机译:综合表示;语境功能;对象语义;场景识别;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号