One of the ultimate challenges of computer vision is in video semantic understanding. Many efforts at detecting events in video have focused on structured sequences such as sports or news broadcasts. However even in seemingly freeform media such as feature films, there exists inherent structure and established production codes. Over the last century, film theorists have developed the principles of continuity editing. One tenet of continuity editing is known as match framing: in order for a shot boundary to appear seamless, the viewer's focus of attention should not have to move very far from one shot to the next. Filmmakers will generally adhere to the continuity editing guidelines in order for audiences to maintain their suspension of disbelief. Often times, however, prudent violations of continuity can jar the viewer, for example during action scenes or moments of high intensity. By detecting violations of the continuity editing principles, it is possible to locate portions of a film that the filmmaker is interested in portraying as different from the rest of the film. We have developed a method for automatically delecting violations of the match framing principle that fuses film theory, psychophysical modeling, image morphology and pattern recognition. First, shot detection is performed on the entire film. Next, we compute the saliency map on a frame before and after the shot boundary. We then treat each saliency map as a distribution, and estimate a 3-component Gaussian Mixture Model of the salient peaks. Finally, by comparing distributions we are able to estimate how active the viewer's eye will need to be from one shot to the next. Experiments demonstrate a correlation between match frame violations and plot in a small corpus of full-length movies.
展开▼