arXiv:2306.09869 Abstract | arXiv Analytics

arXiv:2306.09869 [cs.CV]Abstract References Reviews Resources

Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models

Geon Yeong Park, Jeongsol Kim, Beomsu Kim, Sang Wan Lee, Jong Chul Ye

Published 2023-06-16Version 1

Despite the remarkable performance of text-to-image diffusion models in image generation tasks, recent studies have raised the issue that generated images sometimes cannot capture the intended semantic contents of the text prompts, which phenomenon is often called semantic misalignment. To address this, here we present a novel energy-based model (EBM) framework. Specifically, we first formulate EBMs of latent image representations and text embeddings in each cross-attention layer of the denoising autoencoder. Then, we obtain the gradient of the log posterior of context vectors, which can be updated and transferred to the subsequent cross-attention layer, thereby implicitly minimizing a nested hierarchy of energy functions. Our latent EBMs further allow zero-shot compositional generation as a linear combination of cross-attention outputs from different contexts. Using extensive experiments, we demonstrate that the proposed method is highly effective in handling various image generation tasks, including multi-concept generation, text-guided image inpainting, and real and synthetic image editing.

Categories: cs.CV, cs.AI, cs.CL, cs.LG

Keywords: text-to-image diffusion models, bayesian context update, energy-based cross attention, image generation tasks, cross-attention layer

Related articles: Most relevant | Search more

arXiv:2303.15233 [cs.CV] (Published 2023-03-27)

Text-to-Image Diffusion Models are Zero-Shot Classifiers

Kevin Clark, Priyank Jaini

arXiv:2302.08453 [cs.CV] (Published 2023-02-16)

T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models

Chong Mou, Xintao Wang, Liangbin Xie, Jian Zhang, Zhongang Qi, Ying Shan, Xiaohu Qie

arXiv:2303.17591 [cs.CV] (Published 2023-03-30)

Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models

Eric Zhang, Kai Wang, Xingqian Xu, Zhangyang Wang, Humphrey Shi