This paper presents a method for learning 3D object templates from view labeled object images. The 3D template is defined in a joint appearance and geometry space composed of deformable planar part templates placed at different 3D positions and orientations. Appearance of each part template is represented by Gabor filters, which are hierarchically grouped into line segments and geometric shapes. AND-OR trees are further used to quantize the possible geometry and appearance of part templates, so that learning can be done on a subsampled discrete space. Using information gain as a criterion, the best 3D template can be searched through the AND-OR trees using one bottom-up pass and one top-down pass. Experiments on a new car dataset with diverse views show that the proposed method can learn meaningful 3D car templates, and give satisfactory detection and view estimation performance. Experiments are also performed on a public car dataset, which show comparable performance with recent methods.
展开▼