UniFaRN: Unified Transformer for Facial Reaction Generation

Cong Liang, Jiahe Wang, Haofan Zhang, Shangfei Wang

July 2023

Abstract

We propose the Unified Transformer for Facial Reaction GeneratioN (UniFaRN) framework for facial reaction prediction in dyadic interactions. Given the video and audio of one side, the task is to generate facial reactions of the other side. The challenge of the task lies in the fusion of multi-modal inputs and balancing appropriateness and diversity. We adopt the Transformer architecture to tackle the challenge by leveraging its flexibility of handling multi-modal data and ability to control the generation process. By successfully capturing the correlations between multi-modal inputs and outputs with unified layers and balancing the performance with sampling methods, we have won first place in the REACT2023 challenge. Github link is: https://github.com/lc150303/REACT23_Challenge

Type

Conference paper

Publication

the 31st ACM International Conference on Multimedia