arXiv Analytics

Sign in

arXiv:2302.05543 [cs.CV]AbstractReferencesReviewsResources

Adding Conditional Control to Text-to-Image Diffusion Models

Lvmin Zhang, Maneesh Agrawala

Published 2023-02-10Version 1

We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related applications.

Related articles: Most relevant | Search more
arXiv:2312.05849 [cs.CV] (Published 2023-12-10)
InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
arXiv:2302.08453 [cs.CV] (Published 2023-02-16)
T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models
arXiv:2305.17431 [cs.CV] (Published 2023-05-27)
Towards Consistent Video Editing with Text-to-Image Diffusion Models