VACD

Overview

VACD is a large-scale dataset designed for video affective content analysis, featuring accompanying textual descriptions. It encompasses three modalities: visual, audio, and text. Each video clip in the dataset is annotated with ten emotion categories, as well as valence and arousal values. Comprehensive statistical analyses demonstrate that VACD exhibits high emotional density and strong annotation consistency.

Dataset Description

VACD consists of 19,894 video clips sourced from 140 movies, each accompanied by a corresponding textual description. The videos originate from films produced for visually impaired audiences on the Xigua Video platform. We segmented these films into shorter clips while preserving the original movie audio. In addition, the audio narration designed for visually impaired viewers was transcribed into text and used as the textual modality in VACD.

The overall dataset construction process is illustrated in Figure 1. As shown in Figure 2, the transcribed descriptions contain rich emotional content—not only common expressions of facial emotion, but also deeper depictions of internal psychological states.

Fig 1. Overall.

Fig 2. Descriptions.

Annotations and Labels

We annotated each clip in the dataset with Valence, Arousal, and ten primary emotions. Consistency analysis of the annotations yielded results that align well with existing findings in psychological research.

Valence: Valence values are used as -1, 0, and 1 to represent the affective of the overall video as negative, neutral, and positive, respectively.
Arousal: The arousal value uses 0, 1 and 2 to represent the emotional intensity of the video as calm, relatively intense and very intense, respectively.
Primary Emotion: Among the ten basic emotions we selected, six are derived from Plutchik’s Wheel of Emotions: Anger, Surprise, Fear, Joy, Sadness, and Disgust. The remaining four—Trust, Happiness, Shame, and Amusement—were included to better capture the range of emotional expressions commonly present in video content.

Appendix

The appendix for the dataset paper can be accessed via this file. It contains the following content:

Introductions to datasets related to VACD.
Introductions to the three baseline models used in the paper.
Detailed experimental parameters for the three baseline models.
A subset of statistical results on the dataset, including:
- Statistics on the age and gender distribution of annotators.
- Statistics on the duration distribution of video clips.
- The proportion of videos annotated with varying numbers of emotion labels.
- Statistical results of the Valence and Arousal annotations.
Notes on privacy and copyright considerations during dataset annotation.

License and Agreement

To access the VACD dataset, researchers need to print the user license agreement, sign it, and send it to us. Please note that it is required that the agreement is signed by an individual holding an official position at their respective institution. This ensures that the signatory has the authority and responsibility to adhere to the terms of use and licensing conditions outlined in the agreement. After we verify, we will provide a link to download the dataset.

Contact Us

For any inquiries about the VACD dataset, please contact:

Bingzhao Cai (In charge of the database): cbz_2020@mail.ustc.edu.cn

Shangfei Wang: sfwang@ustc.edu.cn

Citation

Waiting