Understanding observations of interacting objects requires one to reason about qualitative scene dynamics. Forexample, on observing a hand lifting a can, we may infer that an 'active' hand is applying an upwards force (by grasping) to lift a 'passive' can. We present an implemented computational theory that derives such dynmaic descriptions directly from camera input. Our approach is based on an analysis of the Newtonian mechanics of a simplified scene model. Interpretations are expressed in terms of assertions about the kinematic and dynamic properties of the scene. The feasibility of interpretations can be determined relative ato Netwonian mechanics by a reduction to linear programming. Finally, to select plausible interpretations, multiple feasible solutions are compared using a preference hierarchy. We provide computational examples to demonstrate that our model is sufficiently rich to describe a wide variety of image sequences.
展开▼