Weakly Supervised Facial Action Unit Recognition Through Adversarial Training

Publication
2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018

Current works on facial action unit (AU) recognition typically require fully AU-annotated facial images for supervised AU classifier training. AU annotation is a timeconsuming, expensive, and error-prone process. While AUs are hard to annotate, facial expression is relatively easy to label. Furthermore, there exist strong probabilistic dependencies between expressions and AUs as well as dependencies among AUs. Such dependencies are referred to as domain knowledge. In this paper, we propose a novel AU recognition method that learns AU classifiers from domain knowledge and expression-annotated facial images through adversarial training. Specifically, we first generate pseudo AU labels according to the probabilistic dependencies between expressions and AUs as well as correlations among AUs summarized from domain knowledge. Then we propose a weakly supervised AU recognition method via an adversarial process, in which we simultaneously train two models: a recognition model R, which learns AU classifiers, and a discrimination model D, which estimates the probability that AU labels generated from domain knowledge rather than the recognized AU labels from R. The training procedure for R maximizes the probability of D making a mistake. By leveraging the adversarial mechanism, the distribution of recognized AUs is closed to AU prior distribution from domain knowledge. Furthermore, the proposed weakly supervised AU recognition can be extended to semisupervised learning scenarios with partially AU-annotated images. Experimental results on three benchmark databases demonstrate that the proposed method successfully leverages the summarized domain knowledge to weakly supervised AU classifier learning through an adversarial process, and thus achieves state-of-the-art performance.

Fig. The framework of RAN. In Part 1, the facial feature X is inputted into recognizer R and get the “fake” AU vector, the “real” AU data generated in section 2.2 are in Part 2. In part 3, P discriminators are trained, “real” or ”fake” AU data are inputted to corresponding discriminator with the same expression label. See text for details.
Fig. The framework of RAN. In Part 1, the facial feature X is inputted into recognizer R and get the “fake” AU vector, the “real” AU data generated in section 2.2 are in Part 2. In part 3, P discriminators are trained, “real” or ”fake” AU data are inputted to corresponding discriminator with the same expression label. See text for details.
Shangfei Wang
Shangfei Wang
Professor of Artificial Intelligence

My research interests include Pattern Recognition, Affective Computing, Probabilistic Graphical Models, Computation Intelligence.

Related