首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Human Parsing with Contextualized Convolutional Neural Network
【24h】

Human Parsing with Contextualized Convolutional Neural Network

机译:上下文解析卷积神经网络的人解析

获取原文
获取原文并翻译 | 示例

摘要

In this work, we address the human parsing task with a novel Contextualized Convolutional Neural Network (Co-CNN) architecture, which well integrates the cross-layer context, global image-level context, semantic edge context, within-super-pixel context and cross-super-pixel neighborhood context into a unified network. Given an input human image, Co-CNN produces the pixel-wise categorization in an end-to-end way. First, the cross-layer context is captured by our basic local-to-global-to-local structure, which hierarchically combines the global semantic information and the local fine details across different convolutional layers. Second, the global image-level label prediction is used as an auxiliary objective in the intermediate layer of the Co-CNN, and its outputs are further used for guiding the feature learning in subsequent convolutional layers to leverage the global image-level context. Third, semantic edge context is further incorporated into Co-CNN, where the high-level semantic boundaries are leveraged to guide pixel-wise labeling. Finally, to further utilize the local super-pixel contexts, the within-super-pixel smoothing and cross-super-pixel neighbourhood voting are formulated as natural sub-components of the Co-CNN to achieve the local label consistency in both training and testing process. Comprehensive evaluations on two public datasets well demonstrate the significant superiority of our Co-CNN over other state-of-the-arts for human parsing. In particular, the F-1 score on the large dataset [1] reaches 81.72percent by Co-CNN, significantly higher than 62.81percent and 64.38percent by the state-of-the-art algorithms, M-CNN [2] and ATR [1] , respectively. By utilizing our newly collected large dataset for training, our Co-CNN can achieve 85.36percent in F-1 score.
机译:在这项工作中,我们使用新颖的上下文卷积神经网络(Co-CNN)架构解决了人类的解析任务,该架构很好地集成了跨层上下文,全局图像级上下文,语义边缘上下文,超像素内上下文和跨超像素邻域上下文整合为统一网络。给定输入的人类图像,Co-CNN以端到端的方式生成像素级分类。首先,跨层上下文是由我们的基本本地到全局到本地结构捕获的,该结构将全局语义信息和跨不同卷积层的本地精细细节分层组合。其次,全局图像级标签预测在Co-CNN的中间层中用作辅助目标,并且其输出还用于指导后续卷积层中的特征学习以利用全局图像级上下文。第三,将语义边缘上下文进一步合并到Co-CNN中,其中利用高级语义边界来指导按像素标注。最后,为了进一步利用局部超像素上下文,将超像素内平滑和跨超像素邻域投票公式化为Co-CNN的自然子组件,以在训练和测试中实现局部标签一致性处理。对两个公共数据集的综合评估很好地证明了我们的Co-CNN优于其他最新的人类解析技术。尤其是,大型数据集[1]上的F-1分数通过Co-CNN达到81.72%,明显高于最新算法,M-CNN [2]和ATR的62.81%和64.38% [1]。通过利用我们新收集的大型数据集进行训练,我们的Co-CNN可以达到F-1分数的85.36%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号