A method of determining a composite action from a video clip, using a conditional random field (CRF), the method includes determining a plurality of features from the video clip, each of the features having a corresponding temporal segment from the video clip. The method may continue by determining, for each of the temporal segments corresponding to one of the features, an initial estimate of an action unit label from a corresponding unary potential function, the corresponding unary potential function having as ordered input the plurality of features from a current temporal segment and at least one other of the temporal segments. The method may further include determining the composite action by jointly optimising the initial estimate of the action unit labels.
展开▼