Recent studies on occluded facial expression recognition typically required fully expression-annotated facial images for training. However, it is time consuming and expensive to collect a large number of facial images with various occlusions and expression annotations. To address this problem, we propose an occluded facial expression recognition method through self-supervised learning, which leverages the profusion of available unlabeled facial images to explore robust facial representations. Specifically, we generate a variety of occluded facial images by randomly adding occlusions to unlabeled facial images. Then we define occlusion prediction as the pretext task for representation learning. We also adopt contrastive learning to make facial representation of a facial image and those of its variations with synthesized occlusions close. Finally, we train an expression classifier as the downstream task. The experimental results on several databases containing both synthesized and realistic occluded facial images demonstrate the superiority of the proposed method over state-of-the-art methods.