This dissertation addresses the task of detecting instances of object categories in photographs. We propose modeling an object category as a collection of object parts linked together in a deformable configuration. We propose two different approaches to model the appearance of object parts that provide robustness to intra-class variations and viewpoint change. The first approach models object parts as locally rigid assemblies of dense feature points and part detection proceeds by incrementally matching the feature points between the model image and the test image. The second approach employs a discriminative classifier (Support Vector Machine) based on a descriptor that consists of a combination of a sparse visual word histogram pyramid and a dense gradient and edge histogram pyramid.We also propose two different approaches for modeling the inter-part relations and algorithms for efficiently learning the model parameters. The first approach uses a generative model that models the joint probability distribution over the locations and visibility of all the object parts. The second approach employs a discriminative Conditional Random Field based model to encode the relative geometry and co-occurrence constraints.
展开▼