Learning articulated object pose is inherently difficult because the pose ishigh dimensional but has many structural constraints. Most existing work do notmodel such constraints and does not guarantee the geometric validity of theirpose estimation, therefore requiring a post-processing to recover the correctgeometry if desired, which is cumbersome and sub-optimal. In this work, wepropose to directly embed a kinematic object model into the deep neutralnetwork learning for general articulated object pose estimation. The kinematicfunction is defined on the appropriately parameterized object motion variables.It is differentiable and can be used in the gradient descent based optimizationin network training. The prior knowledge on the object geometric model is fullyexploited and the structure is guaranteed to be valid. We show convincingexperiment results on a toy example and the 3D human pose estimation problem.For the latter we achieve state-of-the-art result on Human3.6M dataset.
展开▼