arXiv:1707.04993 Abstract | arXiv Analytics

arXiv:1707.04993 [cs.CV]Abstract References Reviews Resources

MoCoGAN: Decomposing Motion and Content for Video Generation

Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, Jan Kautz

Published 2017-07-17Version 1

Visual information in a natural video can be decomposed into two major components: content and motion. While content encodes the objects present in the video, motion encodes the object dynamics. Based on this prior, we propose the Motion and Content decomposed Generative Adversarial Network (MoCoGAN) framework for video generation. The proposed framework generates a video clip by sequentially mapping random noise vectors to video frames. We divide a random noise vector into content and motion parts. The content part, modeled by a Gaussian, is kept fixed when generating individual frames in a short video clip, since the content in a short clip remains largely the same. On the other hand, the motion part, modeled by a recurrent neural network, aims at representing the dynamics in a video. Despite the lack of supervision signals on the motion - content decomposition in natural videos, we show that the MoCoGAN framework can learn to decompose these two factors through a novel adversarial training scheme. Experimental results on action, facial expression, and on a Tai Chi dataset along with comparison to the state-of-the-art verify the effectiveness of the proposed framework. We further show that, by fixing the content noise while changing the motion noise, MoCoGAN learns to generate videos of different dynamics of the same object, and, by fixing the motion noise while changing the content noise, MoCoGAN learns to generate videos of the same motion from different objects. More information is available in our project page (https://github.com/sergeytulyakov/mocogan).

Categories: cs.CV

Keywords: video generation, mapping random noise vectors, decomposing motion, decomposed generative adversarial network, generate videos

Related articles: Most relevant | Search more

arXiv:2410.22979 [cs.CV] (Published 2024-10-30)

LumiSculpt: A Consistency Lighting Control Network for Video Generation

Yuxin Zhang, Dandan Zheng, Biao Gong, Jingdong Chen, Ming Yang, Weiming Dong, Changsheng Xu

arXiv:2405.15881 [cs.CV] (Published 2024-05-24)

Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and Video Generation

Shentong Mo, Yapeng Tian

arXiv:2101.03710 [cs.CV] (Published 2021-01-11)

ArrowGAN : Learning to Generate Videos by Learning Arrow of Time

Kibeom Hong, Youngjung Uh, Hyeran Byun