UniFaRN: Unified Transformer for Facial Reaction Generation

Abstract

We propose the Unified Transformer for Facial Reaction GeneratioN (UniFaRN) framework for facial reaction prediction in dyadic interactions. Given the video and audio of one side, the task is to generate facial reactions of the other side. The challenge of the task lies in the fusion of multi-modal inputs and balancing appropriateness and diversity. We adopt the Transformer architecture to tackle the challenge by leveraging its flexibility of handling multi-modal data and ability to control the generation process. By successfully capturing the correlations between multi-modal inputs and outputs with unified layers and balancing the performance with sampling methods, we have won first place in the REACT2023 challenge. Github link is: https://github.com/lc150303/REACT23_Challenge

Publication
the 31st ACM International Conference on Multimedia
Shangfei Wang
Shangfei Wang
Professor of Artificial Intelligence

My research interests include Pattern Recognition, Affective Computing, Probabilistic Graphical Models, Computation Intelligence.

Related