arXiv Analytics

Sign in

arXiv:1912.05523 [cs.CV]AbstractReferencesReviewsResources

$\mathbf{G^{3}AN}$: This video does not exist. Disentangling motion and appearance for video generation

Yaohui Wang, Piotr Bilinski, Francois Bremond, Antitza Dantcheva

Published 2019-12-11Version 1

Creating realistic human videos introduces the challenge of being able to simultaneously generate both appearance, as well as motion. To tackle this challenge, we propose the novel spatio-temporal GAN-architecture $G^3AN$, which seeks to capture the distribution of high dimensional video data and to model appearance and motion in disentangled manner. The latter is achieved by decomposing appearance and motion in a three-stream Generator, where the main stream aims to model spatio-temporal consistency, whereas the two auxiliary streams augment the main stream with multi-scale appearance and motion features, respectively. An extensive quantitative and qualitative analysis shows that our model systematically and significantly outperforms state-of-the-art methods on the facial expression datasets MUG and UvA-NEMO, as well as the Weizmann and UCF101 datasets on human action. Additional analysis on the learned latent representations confirms the successful decomposition of appearance and motion.

Related articles: Most relevant | Search more
arXiv:2202.09487 [cs.CV] (Published 2022-02-19)
SAGE: SLAM with Appearance and Geometry Prior for Endoscopy
arXiv:2311.17982 [cs.CV] (Published 2023-11-29)
VBench: Comprehensive Benchmark Suite for Video Generative Models
Ziqi Huang et al.
arXiv:2203.14074 [cs.CV] (Published 2022-03-26)
V3GAN: Decomposing Background, Foreground and Motion for Video Generation