A method, computer readable medium and system are disclosed for sequential multitasking to generate coordinates of landmarks within images. The landmark locations may be identified on an image of a human face and used for emotion recognition, face identity verification, instant tracking, pose estimation, etc. A neural network model processes input image data to generate pixel-level probability estimates for landmarks in the input image data, and a soft-argmax function calculates predicted coordinates of each landmark based on the pixel-level probability estimates.
展开▼