We propose a feature-fusion network for pose estimation directly from RGB images without any depth information in this study.First,we introduce a two-stream architecture consisting of segmentation and regression streams.The segmentation stream processes the spatial embedding features and obtains the corresponding image crop.These features are further coupled with the image crop in the fusion network.Second,we use an efficient perspective-n-point(E-PnP)algorithm in the regression stream to extract robust spatial features between 3D and 2D keypoints.Finally,we perform iterative refinement with an end-to-end mechanism to improve the estimation performance.We conduct experiments on two public datasets of YCB-Video and the challenging Occluded-LineMOD.The results show that our method outperforms state-of-the-art approaches in both the speed and the accuracy.
展开▼
机译:Reproducing Images of Ancient Bronzes in the Early 20th Century: Rong Geng's Catalogue of Ritual Bronze Objects in the Wuying Palace =民初青铜器图录复制观念的转变:以容庚《武英殿彝器图录》为中心