Although video affective content analysis has great potential in many applications, it has not been thoroughly studied due to limited datasets. In this paper, we construct a large-scale video affective dataset with danmu (VAD). It consists of 19,267 elaborately segmented video clips from user-generated videos. The VAD dataset is annotated by the crowdsourcing platform with discrete valence, arousal, and primary emotions, as well as the comparison of valence and arousal between two consecutive video clips. Unlike previous datasets, including only video clips, our proposed dataset also provides danmu, which is the real-time comment from users as they watch a video. Danmu provides extra information for video affective content analysis. As a preliminary assessment of the usability of our dataset, an analysis of inter-annotator consistency for each label is conducted using weighted Fleiss’ Kappa, regular Fleiss’ Kappa, intraclass correlation coefficient, and percent consensus. Besides, we also perform a statistical analysis of labels and danmu. Finally, video affective content analysis is conducted on our dataset and three typical methods (i.e., TFN, MulT, and MISA) are leveraged to provide benchmarks. We also demonstrate that danmu can significantly improve the performance of the video affective content analysis task on some labels. Our dataset is available for research purposes.