arXiv Analytics

Sign in

arXiv:2403.12034 [cs.CV]AbstractReferencesReviewsResources

VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models

Junlin Han, Filippos Kokkinos, Philip Torr

Published 2024-03-18, updated 2024-07-18Version 2

This paper presents a novel method for building scalable 3D generative models utilizing pre-trained video diffusion models. The primary obstacle in developing foundation 3D generative models is the limited availability of 3D data. Unlike images, texts, or videos, 3D data are not readily accessible and are difficult to acquire. This results in a significant disparity in scale compared to the vast quantities of other types of data. To address this issue, we propose using a video diffusion model, trained with extensive volumes of text, images, and videos, as a knowledge source for 3D data. By unlocking its multi-view generative capabilities through fine-tuning, we generate a large-scale synthetic multi-view dataset to train a feed-forward 3D generative model. The proposed model, VFusion3D, trained on nearly 3M synthetic multi-view data, can generate a 3D asset from a single image in seconds and achieves superior performance when compared to current SOTA feed-forward 3D generative models, with users preferring our results over 90% of the time.

Comments: ECCV 2024. Project page: https://junlinhan.github.io/projects/vfusion3d.html
Categories: cs.CV, cs.GR, cs.LG
Related articles: Most relevant | Search more
arXiv:2305.10474 [cs.CV] (Published 2023-05-17)
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models
Songwei Ge et al.
arXiv:2409.07452 [cs.CV] (Published 2024-09-11)
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models
arXiv:2312.02813 [cs.CV] (Published 2023-12-05)
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models