VAD

Overview

The VAD dataset is a large-scale dataset in the field of video affective content analysis. It aims to solve the problem of limited datasets in this field. VAD has rich emotional labels and modalities, hoping to promote the development of the related tasks

Dataset Description

The VAD dataset consists of 19,267 elaborately segmented video clips from user-generated videos on the BiliBili website. The VAD dataset is annotated by the crowdsourcing platform with discrete valence, arousal, and primary emotions, as well as the comparison of valence and arousal between two consecutive video clips. Unlike previous datasets, including only video clips, our VAD dataset also provides danmu as shown in the Fig 1, which is the real-time comment from users as they watch a video. Danmu provides extra information for video affective content analysis.

Fig 1. Original video from BiliBili with danmu in our VAD dataset.

Annotations and Labels

The purpose of the annotation process is to assign valence, arousal, and primary emotion to the intended affect in these segmented video clips and compare two adjacent clips in the same video with respect to valence and arousal simultaneously as shown in the Fig 2.

Valence: The valence (V) is limited to three discrete levels ranging from 1 to 3. They represent negative, neutral, and positive respectively.
Arousal: The arousal (A) is limited to three discrete levels classes ranging from 1 to 3. They represent calm, excited, and very excited respectively.
Primary Emotion: The primary emotion (PE) contains thirteen classes including eight basic emotions from Plutchik’s wheel and five other emotions. Specifically, the eight basic emotions are anger, disgust, fear, happiness, surprise, trust, anticipation, and sadness. Five other emotions are love, pride, satisfaction, horror, and neutral. In particular, love is a combination of joy and trust, and pride is composed of anger and joy through the emotion combinatorial theory of Plutchik’s wheel.
Valence Comparison: The valence comparison (VC) is set to three discrete levels from -1 to 1 to represent more negative, similar, and more positive valence respectively.
Arousal Comparison:: The arousal comparison (AC) is also set to three discrete levels from -1 to 1 to represent lower, similar, and higher arousal respectively.

Fig 2. Dataset annotation.

License and Agreement

To access the VAD dataset, researchers need to print the user license agreement, sign it, and send it to us. Please note that it is required that the agreement is signed by an individual holding an official position at their respective institution. This ensures that the signatory has the authority and responsibility to adhere to the terms of use and licensing conditions outlined in the agreement. After we verify, we will provide a link to download the dataset.

Usage

If the VAD dataset infringes upon the rights of a third party, please contact us. For the directory description of the dataset we return for users, please refer to this file.

Contact Us

For any inquiries about the VAD dataset, please contact:

Bingzhao Cai (In charge of the database): cbz_2020@mail.ustc.edu.cn

Shangfei Wang: sfwang@ustc.edu.cn

Citation

@article{wang2024vad,
  title={VAD: A Video Affective Dataset with Danmu},
  author={Wang, Shangfei and Li, Xin and Zheng, Feiyi and Pan, Jicai and Li, Xuewei and Chang, Yanan and Li, Qiong and Wang, Jiahe and Xiao, Yufei and others},
  journal={IEEE Transactions on Affective Computing},
  year={2024},
  publisher={IEEE}
}