Methods for automated facial expression recognition - identifying faces as happy sad, angry, etc. - typically rely on the classification of features extracted from images. These features, designed to encode shape and texture information, depend on both (1) the expression an individual is making, and (2) the individual's physical characteristics and lighting conditions of the image. To reduce the effect of (2), a common strategy is to establish a "baseline" for an individual and subtract out this individual's baseline neutral feature. This extra neutral feature information often is not available - in particular for in-the-wild, real-time classification of a previously unseen subject. Thus, in order to implement "neutral subtraction," one must estimate the individual's neutral feature. Existing methods to do this are susceptible to class imbalance at test time (e.g., averaging over all facial features), require a more complex model specific to the individual to be trained, or are restricted to features computed entirely from tracked landmark points (taking advantage of a subset of "stable points" which move little as an individual emotes). We extend neutral subtraction to different computer vision feature spaces as a method to correct for inter-face and lighting variance. We further propose a simple, real-time method which is robust to class imbalance and in principal works over a wide class of feature choices. We test this method on feature extraction techniques that lead to high baseline accuracy without neutral subtraction (97% on the Extended Cohn-Kanade Dataset). We find that on difficult classification tasks our method recovers almost 2/3 of the ~ 8% gain shown by a "cheating" neutral-subtracted feature classifier, which uses examples that have been labeled as neutral, validating with both HOG and SIFT features.
展开▼