arXiv Analytics

Sign in

arXiv:1707.04993 [cs.CV]AbstractReferencesReviewsResources

MoCoGAN: Decomposing Motion and Content for Video Generation

Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, Jan Kautz

Published 2017-07-17Version 1

Visual information in a natural video can be decomposed into two major components: content and motion. While content encodes the objects present in the video, motion encodes the object dynamics. Based on this prior, we propose the Motion and Content decomposed Generative Adversarial Network (MoCoGAN) framework for video generation. The proposed framework generates a video clip by sequentially mapping random noise vectors to video frames. We divide a random noise vector into content and motion parts. The content part, modeled by a Gaussian, is kept fixed when generating individual frames in a short video clip, since the content in a short clip remains largely the same. On the other hand, the motion part, modeled by a recurrent neural network, aims at representing the dynamics in a video. Despite the lack of supervision signals on the motion - content decomposition in natural videos, we show that the MoCoGAN framework can learn to decompose these two factors through a novel adversarial training scheme. Experimental results on action, facial expression, and on a Tai Chi dataset along with comparison to the state-of-the-art verify the effectiveness of the proposed framework. We further show that, by fixing the content noise while changing the motion noise, MoCoGAN learns to generate videos of different dynamics of the same object, and, by fixing the motion noise while changing the content noise, MoCoGAN learns to generate videos of the same motion from different objects. More information is available in our project page (https://github.com/sergeytulyakov/mocogan).

Related articles: Most relevant | Search more
arXiv:2311.17982 [cs.CV] (Published 2023-11-29)
VBench: Comprehensive Benchmark Suite for Video Generative Models
Ziqi Huang et al.
arXiv:2409.11367 [cs.CV] (Published 2024-09-17)
OSV: One Step is Enough for High-Quality Image to Video Generation
Xiaofeng Mao et al.
arXiv:2408.06070 [cs.CV] (Published 2024-08-12)
ControlNeXt: Powerful and Efficient Control for Image and Video Generation