{ "id": "1707.04993", "version": "v1", "published": "2017-07-17T03:42:17.000Z", "updated": "2017-07-17T03:42:17.000Z", "title": "MoCoGAN: Decomposing Motion and Content for Video Generation", "authors": [ "Sergey Tulyakov", "Ming-Yu Liu", "Xiaodong Yang", "Jan Kautz" ], "categories": [ "cs.CV" ], "abstract": "Visual information in a natural video can be decomposed into two major components: content and motion. While content encodes the objects present in the video, motion encodes the object dynamics. Based on this prior, we propose the Motion and Content decomposed Generative Adversarial Network (MoCoGAN) framework for video generation. The proposed framework generates a video clip by sequentially mapping random noise vectors to video frames. We divide a random noise vector into content and motion parts. The content part, modeled by a Gaussian, is kept fixed when generating individual frames in a short video clip, since the content in a short clip remains largely the same. On the other hand, the motion part, modeled by a recurrent neural network, aims at representing the dynamics in a video. Despite the lack of supervision signals on the motion - content decomposition in natural videos, we show that the MoCoGAN framework can learn to decompose these two factors through a novel adversarial training scheme. Experimental results on action, facial expression, and on a Tai Chi dataset along with comparison to the state-of-the-art verify the effectiveness of the proposed framework. We further show that, by fixing the content noise while changing the motion noise, MoCoGAN learns to generate videos of different dynamics of the same object, and, by fixing the motion noise while changing the content noise, MoCoGAN learns to generate videos of the same motion from different objects. More information is available in our project page (https://github.com/sergeytulyakov/mocogan).", "revisions": [ { "version": "v1", "updated": "2017-07-17T03:42:17.000Z" } ], "analyses": { "keywords": [ "video generation", "mapping random noise vectors", "decomposing motion", "decomposed generative adversarial network", "generate videos" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }