arXiv:1912.05523 [cs.CV]AbstractReferencesReviewsResources
$\mathbf{G^{3}AN}$: This video does not exist. Disentangling motion and appearance for video generation
Yaohui Wang, Piotr Bilinski, Francois Bremond, Antitza Dantcheva
Published 2019-12-11Version 1
Creating realistic human videos introduces the challenge of being able to simultaneously generate both appearance, as well as motion. To tackle this challenge, we propose the novel spatio-temporal GAN-architecture $G^3AN$, which seeks to capture the distribution of high dimensional video data and to model appearance and motion in disentangled manner. The latter is achieved by decomposing appearance and motion in a three-stream Generator, where the main stream aims to model spatio-temporal consistency, whereas the two auxiliary streams augment the main stream with multi-scale appearance and motion features, respectively. An extensive quantitative and qualitative analysis shows that our model systematically and significantly outperforms state-of-the-art methods on the facial expression datasets MUG and UvA-NEMO, as well as the Weizmann and UCF101 datasets on human action. Additional analysis on the learned latent representations confirms the successful decomposition of appearance and motion.