PPR-Net++: Accurate 6-D Pose Estimation in Stacked Scenarios

Long Zeng; Wei Jie Lv; Zhi Kai DongYong Jin Liu

摘要

Most supervised learning-based pose estimation methods for stacked scenes are trained on massive synthetic datasets. In most cases, the challenge is that the learned network on the training dataset is no longer optimal on the testing dataset. To address this problem, we propose a pose regression network PPR-Net++. It transforms each scene point into a point in the centroid space, followed by a clustering process and a voting process. In the training phase, a mapping function between the network’s critical parameter (i.e., the bandwidth of the clustering algorithm) and the compactness of the centroid distributions is obtained. This function is used to adapt the bandwidth between centroid distributions of two different domains. In addition, to further improve the pose estimation accuracy, the network also predicts the confidence of each point, based on its visibility and pose error. Only the points with high confidence have the right to vote for the final object pose. In experiments, our method is trained on the IPA synthetic dataset and compared with the state-of-the-art algorithm. When tested with the public synthetic Siléane dataset, our method is better in all eight objects, where five of them are improved by more than 5 in average precision (AP). On IPA real dataset, our method outperforms a large margin by 20. This lays a solid foundation for robot grasping in industrial scenarios. Note to Practitioners—Our work is motivated by industrial product assembly based on robot grasping. The industrial parts are usually manufactured by numerical machines and piled in bins. Our method can estimate the poses of visible parts accurately. A pose of a part includes its centroid and spatial orientations. Combined with a depth camera, this algorithm allows an industrial robot to understand complex stacked scenes. We improve the pose estimation accuracy in order to assemble parts with robot grasping, without an additional pose adjuster. Our network can learn from a synthetic dataset and apply it to real-world data, without a significant accuracy drop. The synthetic dataset can be obtained easily by computer simulation programs, so the training data are sufficient. Experiments demonstrate that our method outperforms the state-of-the-art pose estimation approaches.

机译：大多数基于监督学习的堆叠场景姿态估计方法都是在大规模合成数据集上训练的。在大多数情况下，挑战在于训练数据集上的学习网络在测试数据集上不再是最优的。为了解决这个问题，我们提出了一种姿态回归网络PPR-Net++。它将每个场景点转换为质心空间中的一个点，然后进行聚类过程和投票过程。在训练阶段，获得网络关键参数（即聚类算法的带宽）与质心分布的紧凑性之间的映射函数。此函数用于调整两个不同域的质心分布之间的带宽。此外，为了进一步提高姿态估计精度，网络还根据每个点的可见性和姿态误差来预测每个点的置信度。只有具有高度置信度的点才有权投票选出最终的物体姿势。在实验中，我们的方法在IPA合成数据集上进行训练，并与最先进的算法进行比较。当使用公共合成Siléane数据集进行测试时，我们的方法在所有八个对象中都更好，其中五个对象的平均精度（AP）提高了5%以上。在IPA真实数据集上，我们的方法比实际数据集高出20%。这为机器人在工业场景中抓取奠定了坚实的基础。从业者须知——我们的工作是由基于机器人抓取的工业产品组装推动的。工业零件通常由数控机制造并堆放在箱子中。我们的方法可以准确地估计可见部分的姿态。零件的姿态包括其质心和空间方向。该算法与深度相机相结合，使工业机器人能够理解复杂的堆叠场景。我们提高了姿态估计的精度，以便通过机器人抓取组装零件，而无需额外的姿态调节器。我们的网络可以从合成数据集中学习并将其应用于真实世界的数据，而不会显着降低准确性。通过计算机模拟程序可以很容易地获得合成数据集，因此训练数据足够。实验表明，我们的方法优于最先进的姿态估计方法。

PPR-Net++: Accurate 6-D Pose Estimation in Stacked Scenarios

摘要

著录项

引文网络

相关主题

期刊订阅