首页> 外文会议>International Conference on Computer Vision >Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images
【24h】

Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images

机译:Pix2Vox:从单视图和多视图图像进行上下文感知的3D重构

获取原文

摘要

Recovering the 3D representation of an object from single-view or multi-view RGB images by deep neural networks has attracted increasing attention in the past few years. Several mainstream works (e.g., 3D-R2N2) use recurrent neural networks (RNNs) to fuse multiple feature maps extracted from input images sequentially. However, when given the same set of input images with different orders, RNN-based approaches are unable to produce consistent reconstruction results. Moreover, due to long-term memory loss, RNNs cannot fully exploit input images to refine reconstruction results. To solve these problems, we propose a novel framework for single-view and multi-view 3D reconstruction, named Pix2Vox. By using a well-designed encoder-decoder, it generates a coarse 3D volume from each input image. Then, a context-aware fusion module is introduced to adaptively select high-quality reconstructions for each part (e.g., table legs) from different coarse 3D volumes to obtain a fused 3D volume. Finally, a refiner further refines the fused 3D volume to generate the final output. Experimental results on the ShapeNet and Pix3D benchmarks indicate that the proposed Pix2Vox outperforms state-of-the-arts by a large margin. Furthermore, the proposed method is 24 times faster than 3D-R2N2 in terms of backward inference time. The experiments on ShapeNet unseen 3D categories have shown the superior generalization abilities of our method.
机译:在过去的几年中,通过深度神经网络从单视图或多视图RGB图像中恢复对象的3D表示形式引起了越来越多的关注。几项主流作品(例如3D-R2N2)使用递归神经网络(RNN)依次融合从输入图像中提取的多个特征图。但是,当给定一组具有不同顺序的输入图像时,基于RNN的方法将无法产生一致的重建结果。此外,由于长期的内存丢失,RNN无法充分利用输入图像来细化重建结果。为了解决这些问题,我们提出了一种用于单视图和多视图3D重建的新颖框架,名为Pix2Vox。通过使用设计良好的编码器/解码器,它可以从每个输入图像生成一个粗糙的3D体积。然后,引入情境感知融合模块,以从不同的粗略3D体积中为每个零件(例如桌腿)自适应地选择高质量的重建,以获得融合的3D体积。最后,优化器进一步优化融合的3D体积以生成最终输出。在ShapeNet和Pix3D基准测试上的实验结果表明,所提出的Pix2Vox在很大程度上优于最新技术。此外,就后向推理时间而言,所提出的方法比3D-R2N2快24倍。在ShapeNet看不见的3D类别上进行的实验表明,我们的方法具有出色的泛化能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号