arXiv:2302.05543 Abstract | arXiv Analytics

arXiv:2302.05543 [cs.CV]Abstract References Reviews Resources

Adding Conditional Control to Text-to-Image Diffusion Models

Published 2023-02-10Version 1

We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related applications.

Comments: 33 pages

Categories: cs.CV, cs.AI, cs.GR, cs.HC, cs.MM

Keywords: text-to-image diffusion models, adding conditional control, control pretrained large diffusion models, controlnet learns task-specific conditions, support additional input conditions

Related articles: Most relevant | Search more

arXiv:2312.05849 [cs.CV] (Published 2023-12-10)

InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models

Jiun Tian Hoe, Xudong Jiang, Chee Seng Chan, Yap-Peng Tan, Weipeng Hu

arXiv:2302.08453 [cs.CV] (Published 2023-02-16)

T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models

Chong Mou, Xintao Wang, Liangbin Xie, Jian Zhang, Zhongang Qi, Ying Shan, Xiaohu Qie

arXiv:2305.17431 [cs.CV] (Published 2023-05-27)

Towards Consistent Video Editing with Text-to-Image Diffusion Models