首页> 外文会议>International Conference on Computer Vision >Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images
【24h】

Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images

机译:PIX2VOX:来自单个和多视图图像的上下文感知3D重建

获取原文

摘要

Recovering the 3D representation of an object from single-view or multi-view RGB images by deep neural networks has attracted increasing attention in the past few years. Several mainstream works (e.g., 3D-R2N2) use recurrent neural networks (RNNs) to fuse multiple feature maps extracted from input images sequentially. However, when given the same set of input images with different orders, RNN-based approaches are unable to produce consistent reconstruction results. Moreover, due to long-term memory loss, RNNs cannot fully exploit input images to refine reconstruction results. To solve these problems, we propose a novel framework for single-view and multi-view 3D reconstruction, named Pix2Vox. By using a well-designed encoder-decoder, it generates a coarse 3D volume from each input image. Then, a context-aware fusion module is introduced to adaptively select high-quality reconstructions for each part (e.g., table legs) from different coarse 3D volumes to obtain a fused 3D volume. Finally, a refiner further refines the fused 3D volume to generate the final output. Experimental results on the ShapeNet and Pix3D benchmarks indicate that the proposed Pix2Vox outperforms state-of-the-arts by a large margin. Furthermore, the proposed method is 24 times faster than 3D-R2N2 in terms of backward inference time. The experiments on ShapeNet unseen 3D categories have shown the superior generalization abilities of our method.
机译:通过深神经网络从单视图或多视图RGB图像中恢复对象的3D表示,在过去几年中引起了不断的关注。若干主流工作(例如,3D-R2N2)使用经常性神经网络(RNN)来顺序地熔断从输入图像中提取的多个特征映射。但是,当给定具有不同订单的相同输入图像集时,基于RNN的方法无法产生一致的重建结果。此外,由于长期内存损耗,RNN不能完全利用输入图像来细化重建结果。为解决这些问题,我们提出了一个名为Pix2vox的单视图和多视图3D重建的新颖框架。通过使用精心设计的编码器解码器,它产生来自每个输入图像的粗略3D体积。然后,引入上下文感知的融合模块,以自适应地为来自不同粗略3D卷的每个部分(例如,表腿)的高质量重建来获得熔融的3D体积。最后,炼油厂还改进了融合的3D体积以产生最终输出。 ShapEnet​​和PIX3D基准上的实验结果表明,所提出的Pix2Vox通过大边缘优于最先进的。此外,在后向推理时间方面,所提出的方法比3D-R2N2快24倍。 ShapeNet看不见的3D类别的实验表明了我们方法的卓越泛化能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号