Facial Action Unit Recognition Enhanced by Text Descriptions of FACS

Abstract

Although the descriptions of facial action units (AUs) provide crucial semantic knowledge for representation learning from facial images, they have not been fully explored for facial action unit recognition. In this paper, we propose a method that effectively explores the knowledge existing in AU descriptions to enhance AU recognition. Specifically, the proposed method consists of three components, i.e. AU recognition network, global representation alignment, and AU representation alignment. The AU recognition network extracts global features and AU-specific features for AU prediction from images. To leverage AU textual descriptions fully, we design two-level representation alignment for AU recognition. The global representation alignment component closes the distance between the global facial features and its corresponding positive global embedding extracted from textual descriptions. Then, the AU-specific features are aligned with the positive AU textual embedding by the AU representation alignment component. Negative textual embedding generation strategies are also designed to further boost the two-level representation alignment. Through the two-level alignment, AU textual descriptions guide image representation learning of the AU recognition network. Experiments on two benchmark datasets and one in-the-wild dataset demonstrate the efficacy of the description-enhanced AU recognition method, compared with the state-of-the-art works.

Publication
IEEE Transactions on Affective Computing
Shangfei Wang
Shangfei Wang
Professor of Artificial Intelligence

My research interests include Pattern Recognition, Affective Computing, Probabilistic Graphical Models, Computation Intelligence.