首页> 外文会议>IEEE Conference on Computer Vision and Pattern Recognition >Recombinator Networks: Learning Coarse-to-Fine Feature Aggregation
【24h】

Recombinator Networks: Learning Coarse-to-Fine Feature Aggregation

机译:重组网络:学习粗略特征聚合

获取原文

摘要

Deep neural networks with alternating convolutional, max-pooling and decimation layers are widely used in state of the art architectures for computer vision. Max-pooling purposefully discards precise spatial information in order to create features that are more robust, and typically organized as lower resolution spatial feature maps. On some tasks, such as whole-image classification, max-pooling derived features are well suited, however, for tasks requiring precise localization, such as pixel level prediction and segmentation, max-pooling destroys exactly the information required to perform well. Precise localization may be preserved by shallow convnets without pooling but at the expense of robustness. Can we have our max-pooled multilayered cake and eat it too? Several papers have proposed summation and concatenation based methods for combining upsampled coarse, abstract features with finer features to produce robust pixel level predictions. Here we introduce another model - dubbed Recombinator Networks - where coarse features inform finer features early in their formation such that finer features can make use of several layers of computation in deciding how to use coarse features. The model is trained once, end-to-end and performs better than summation-based architectures, reducing the error from the previous state of the art on two facial keypoint datasets, AFW and AFLW, by 30% and beating the current state-of-the-art on 300W without using extra data. We improve performance even further by adding a denoising prediction model based on a novel convnet formulation.
机译:具有交替卷积,最大池和抽取层的深度神经网络被广泛用于计算机视觉的艺术架构状态。最大池目的地丢弃精确的空间信息,以创建更强大的功能,通常组织为较低分辨率的空间特征映射。在某些任务中,例如整个图像分类,最大池派生功能非常适合,但是,对于需要精确定位的任务,例如像素级预测和分段,MAX池完全破坏表现良好所需的信息。精确的定位可以通过浅扫描器保存而不汇集,但以稳健性为代价。我们可以拥有我们的最大池多层蛋糕吗?也可以吃它吗?几篇论文提出了基于求采样的粗糙,抽象特征的基于求和的方法,以产生强大的像素级别预测。在这里,我们介绍了另一个模型 - 被称为重组网络 - 其中粗略的功能在他们的形成中提前通知更精细的功能,使得更精细的功能可以在决定如何使用粗糙度时使用多个计算层。该模型训练了一次,结束于基于总结的架构,从而更好地执行,从两个面部Keypoint数据集,AFW和AFLW上减少了前一个最先进的误差,乘30%并跳动当前状态 - - 在300W的情况下,无需使用额外数据。通过基于新颖的ConvNet配方添加去噪预测模型,我们进一步提高了性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号