Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images

机译：Pix2Vox：从单视图和多视图图像进行上下文感知的3D重构

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recovering the 3D representation of an object from single-view or multi-view RGB images by deep neural networks has attracted increasing attention in the past few years. Several mainstream works (e.g., 3D-R2N2) use recurrent neural networks (RNNs) to fuse multiple feature maps extracted from input images sequentially. However, when given the same set of input images with different orders, RNN-based approaches are unable to produce consistent reconstruction results. Moreover, due to long-term memory loss, RNNs cannot fully exploit input images to refine reconstruction results. To solve these problems, we propose a novel framework for single-view and multi-view 3D reconstruction, named Pix2Vox. By using a well-designed encoder-decoder, it generates a coarse 3D volume from each input image. Then, a context-aware fusion module is introduced to adaptively select high-quality reconstructions for each part (e.g., table legs) from different coarse 3D volumes to obtain a fused 3D volume. Finally, a refiner further refines the fused 3D volume to generate the final output. Experimental results on the ShapeNet and Pix3D benchmarks indicate that the proposed Pix2Vox outperforms state-of-the-arts by a large margin. Furthermore, the proposed method is 24 times faster than 3D-R2N2 in terms of backward inference time. The experiments on ShapeNet unseen 3D categories have shown the superior generalization abilities of our method.

机译：在过去的几年中，通过深度神经网络从单视图或多视图RGB图像中恢复对象的3D表示形式引起了越来越多的关注。几项主流作品（例如3D-R2N2）使用递归神经网络（RNN）依次融合从输入图像中提取的多个特征图。但是，当给定一组具有不同顺序的输入图像时，基于RNN的方法将无法产生一致的重建结果。此外，由于长期的内存丢失，RNN无法充分利用输入图像来细化重建结果。为了解决这些问题，我们提出了一种用于单视图和多视图3D重建的新颖框架，名为Pix2Vox。通过使用设计良好的编码器/解码器，它可以从每个输入图像生成一个粗糙的3D体积。然后，引入情境感知融合模块，以从不同的粗略3D体积中为每个零件（例如桌腿）自适应地选择高质量的重建，以获得融合的3D体积。最后，优化器进一步优化融合的3D体积以生成最终输出。在ShapeNet和Pix3D基准测试上的实验结果表明，所提出的Pix2Vox在很大程度上优于最新技术。此外，就后向推理时间而言，所提出的方法比3D-R2N2快24倍。在ShapeNet看不见的3D类别上进行的实验表明，我们的方法具有出色的泛化能力。

著录项

来源
《International Conference on Computer Vision》|2019年|2690-2698|共9页
会议地点 Seoul(KR)
作者
Haozhe Xie; Hongxun Yao; Xiaoshuai Sun; Shangchen Zhou; Shengping Zhang;
展开▼
作者单位

Harbin Institute of Technology;

Nanyang Technological University;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Three-dimensional displays; Image reconstruction; Shape; Feature extraction; Decoding; Solid modeling; Kernel;

机译：三维显示器；影像重建;形状;特征提取;解码;实体建模；核心;

相似文献

外文文献
中文文献
专利

1. Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from Single and Multiple Images [J] . Xie Haozhe, Yao Hongxun, Zhang Shengping, International Journal of Computer Vision . 2020,第12期

机译：PIX2VOX ++：单尺度上下文感知3D对象重建单个和多个图像
2. Multi-view self-supervised learning for 3D facial texture reconstruction from single image [J] . Zeng Xiaoxing, Hu Ruyun, Shi Wu, Image and Vision Computing . 2021,第Nova期

机译：单幅图像3D面部纹理重建的多视图自我监督学习
3. Automatic 3D building reconstruction from multi-view aerial images with deep learning [J] . Yu Dawen, Ji Shunping, Liu Jin, ISPRS Journal of Photogrammetry and Remote Sensing . 2021,第Jana期

机译：深度学习多视图空中图像自动3D建筑重建
4. Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images [C] . Haozhe Xie, Hongxun Yao, Xiaoshuai Sun, International Conference on Computer Vision . 2019

机译：PIX2VOX：来自单个和多视图图像的上下文感知3D重建
5. 3D SEM Surface Reconstruction from Multi-view Images [D] . Rehman, Waleedur ur. 2018

机译：来自多视图图像的3D SEM表面重建
6. New microangiography system development providing improved small vessel imaging increased contrast to noise ratios and multi-view 3D reconstructions [O] . Andrew T. Kuhls, Vikas Patel, Ciprian Ionita, -1

机译：新的微血管造影系统开发可提供改进的小血管成像增强的噪声比对比度以及多视图3D重建
7. Multi-view Consistency Loss for Improved Single-Image 3D Reconstruction of Clothed People [O] . Akin Caliskan, Armin Mustafa, Evren Imre, 2021

机译：改进衣服的单图像三维重建的多视图一致性损失

Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images

摘要

著录项

相似文献

相关主题

期刊订阅