Coordinate CNNs and LSTMs to categorize scene images with multi-views and multi-levels of abstraction

Bai Shuang; Tang Huadong; An Shan

首页> 外文期刊>Expert Systems with Application >Coordinate CNNs and LSTMs to categorize scene images with multi-views and multi-levels of abstraction

【24h】

Coordinate CNNs and LSTMs to categorize scene images with multi-views and multi-levels of abstraction

机译：协调CNN和LSTM将场景图像分类为多视图和多抽象级别

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Due to complexities of scene images, scene categorization is a challenging task in the computer vision community. To categorize scene images effectively, in this paper, we propose to coordinate Convolutional Neural Networks (CNNs) and Long Short-Term Memory networks (LSTMs) to perform scene categorization with multi-views and multi-levels of abstraction. Specifically, to utilize the complementary properties of features of different levels of abstraction, we employ CNNs to extract features of multi-levels of abstraction based on its hierarchical structure. Furthermore, in order to deal with variations in scene image contents, we represent each image with multiple views, and in order to take correlation between image views into consideration, we treat image view features from the same image as a sequence and employ Long Short-Term Memory networks (LSTMs) to perform classification. Based on the proposed method, information of multi-views and multi-levels of abstraction can be made full use of in a single framework. We evaluate the proposed method on two challenging scene datasets, MIT indoor scene 67 and SUN 397. Obtained results demonstrate the effectiveness of utilizing CNNs and LSTMs to categorize scene images with multi-views and multi-levels of abstraction. Experiments on comparison to state-of-the-art methods show that the proposed method outperforms all the other methods used for comparison. (C) 2018 Elsevier Ltd. All rights reserved.

机译：由于场景图像的复杂性，在计算机视觉社区中，场景分类是一项艰巨的任务。为了有效地对场景图像进行分类，在本文中，我们建议协调卷积神经网络（CNN）和长短期记忆网络（LSTM）来执行具有多视图和多抽象级别的场景分类。具体来说，为了利用不同抽象级别的特征的互补属性，我们使用CNN来基于其层次结构提取多抽象级别的特征。此外，为了处理场景图像内容的变化，我们用多个视图表示每个图像，并且为了考虑图像视图之间的相关性，我们将来自同一图像的图像视图特征视为一个序列，并采用Long Short-术语存储网络（LSTM）进行分类。基于所提出的方法，可以在单个框架中充分利用多视图和抽象级别的信息。我们在两个具有挑战性的场景数据集（MIT室内场景67和SUN 397）上评估了该方法。获得的结果证明了利用CNN和LSTM对具有多视图和多抽象级别的场景图像进行分类的有效性。与最先进方法进行比较的实验表明，所提出的方法优于所有其他用于比较的方法。（C）2018 Elsevier Ltd.保留所有权利。

著录项

来源
《Expert Systems with Application》 |2019年第4期|298-309|共12页
作者
Bai Shuang; Tang Huadong; An Shan;
展开▼
作者单位

Beijing Jingdong Shangke Informat Technol Co Ltd, Beijing, Peoples R China;

Beijing Jiaotong Univ, Sch Elect & Informat Engn, Beijing, Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Scene categorization; Multi-views; Multi-levels of abstraction; Long short-term memory; Convolutional neural networks;

机译：场景分类;多视图;抽象层次;长期短期记忆;卷积神经网络;

相似文献

外文文献
中文文献
专利

1. Scale-space multi-view bag of words for scene categorization [J] . Davar Giveki Multimedia Tools and Applications . 2021,第1期

机译：尺度空间多视图袋用于场景分类
2. Perceptually learning multi-view sparse representation for scene categorization [J] . Yin Weibin, Xu Dongsheng, Wang Zheng, Journal of visual communication & image representation . 2019,第Apra期

机译：感知地学习场景分类的多视图稀疏表示
3. Multi-hour and multi-site air quality index forecasting in Beijing using CNN, LSTM, CNN-LSTM, and spatiotemporal clustering [J] . Yan Rui, Liao Jiaqiang, Yang Jie, Expert systems with applications . 2021,第May期

机译：使用CNN，LSTM，CNN-LSTM和Spatiotemporal聚类的北京多小时和多网站空气质量指标预测
4. Novel CNN architecture with residual learning and deep supervision for large-scale scene image categorization [C] . Hussein A. Al-Barazanchi, Hussam Qassim, Abhishek Verma IEEE 7th Annual Ubiquitous Computing, Electronics amp; Mobile Communication Conference . 2016

机译：具有残差学习和深度监控功能的新型CNN架构，可用于大规模场景图像分类
5. CNNs versus LSTMs for Time Series Forecasting [D] . Bhurtel, Bidur Prasad. 2021

机译：CNN与时间序列预测的LSTMS
6. Two-stage CNNs for computerized BI-RADS categorization in breast ultrasound images [O] . Yunzhi Huang, Luyi Han, Haoran Dou, 2019

机译：乳房超声图像中计算机化BI-RADS分类的两阶段CNN
7. Robust 3D Hand Pose Estimation in Single Depth Images: from Single-View CNN to Multi-View CNNs [O] . Ge, Liuhao, Liang, Hui, Yuan, Junsong, 2016

机译：单深度图像中的鲁棒三维手势估计：来自单视图 CNN到多视图CNN

Coordinate CNNs and LSTMs to categorize scene images with multi-views and multi-levels of abstraction

摘要

著录项

相似文献

相关主题

期刊订阅