Capturing Emotion Distribution for Multimedia Emotion Tagging

Publication
IEEE Transactions on Affective Computing

Multimedia collections usually induce multiple emotions in audiences. The data distribution of multiple emotions can be leveraged to facilitate the learning process of emotion tagging, yet has not been thoroughly explored. To address this, we propose adversarial learning to fully capture emotion distributions for emotion tagging of multimedia data. The proposed multimedia emotion tagging approach includes an emotion classifier and a discriminator. The emotion classifier predicts emotion labels of multimedia data from their content. The discriminator distinguishes the predicted emotion labels from the ground truth labels. The emotion classifier and the discriminator are trained simultaneously in competition with each other. By jointly minimizing the traditional supervised loss and maximizing the distribution similarity between the predicted emotion labels and the ground truth emotion labels, the proposed multimedia emotion tagging approach successfully captures both the mapping function between multimedia content and emotion labels as well as prior distribution in emotion labels, and thus achieves state-of-the-art performance for multiple emotion tagging, as demonstrated by the experimental results on four benchmark databases.

Fig. The framework of the proposed multi-emotion tagging model. C is the multiple emotion tag classifier, and D is the multiple emotion tag discriminator. C(x) is the multiple emotion prediction for feature vector x. y is the ground truth multiple emotion label of x. y 0 is the real multiple emotion label from ground truth label set. The dotted line indicates that there is an one-to-one correspondence between C(x) and y. See text for details.
Fig. The framework of the proposed multi-emotion tagging model. C is the multiple emotion tag classifier, and D is the multiple emotion tag discriminator. C(x) is the multiple emotion prediction for feature vector x. y is the ground truth multiple emotion label of x. y 0 is the real multiple emotion label from ground truth label set. The dotted line indicates that there is an one-to-one correspondence between C(x) and y. See text for details.
Shangfei Wang
Shangfei Wang
Professor of Artificial Intelligence

My research interests include Pattern Recognition, Affective Computing, Probabilistic Graphical Models, Computation Intelligence.

Zhiwei Xu
Zhiwei Xu
Algorithm Engineer

Related