A system and method for obtaining a 3D pose of an object using 2D images from a 2D camera and a learned-based neural network. The neural network extracts a plurality of features on the object from the 2D images and generates a heatmap for each of the extracted features that identify the probability of a location of a feature point on the object by a color representation. The method provides a feature point image that includes the feature points from the heatmaps on the 2D images, and estimates the 3D pose of the object by comparing the feature point image and a 3D virtual CAD model of the object.
展开▼