Although their constructed representation reflects thermal infrared images’ supplementary role for visible images, it has no direct relationship to target expression labels. Furthermore, the hand-craft visible and thermal features may not thoroughly reflect the expression patterns embedded in images. Therefore, in this paper, we propose a new deep two-view approach to learn features from both visible and thermal images and leverage the commonality among visible and thermal images for expression recognition.