arXiv:2311.10093 Abstract | arXiv Analytics

arXiv:2311.10093 [cs.CV]Abstract References Reviews Resources

The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

Omri Avrahami, Amir Hertz, Yael Vinker, Moab Arar, Shlomi Fruchter, Ohad Fried, Daniel Cohen-Or, Dani Lischinski

Published 2023-11-16Version 1

Recent advances in text-to-image generation models have unlocked vast potential for visual creativity. However, these models struggle with generation of consistent characters, a crucial aspect for numerous real-world applications such as story visualization, game development asset design, advertising, and more. Current methods typically rely on multiple pre-existing images of the target character or involve labor-intensive manual processes. In this work, we propose a fully automated solution for consistent character generation, with the sole input being a text prompt. We introduce an iterative procedure that, at each stage, identifies a coherent set of images sharing a similar identity and extracts a more consistent identity from this set. Our quantitative analysis demonstrates that our method strikes a better balance between prompt alignment and identity consistency compared to the baseline methods, and these findings are reinforced by a user study. To conclude, we showcase several practical applications of our approach. Project page is available at https://omriavrahami.com/the-chosen-one

Comments: Project page is available at https://omriavrahami.com/the-chosen-one

Categories: cs.CV, cs.GR, cs.LG

Keywords: text-to-image diffusion models, game development asset design, consistent character generation, text-to-image generation models, prompt alignment

Related articles: Most relevant | Search more

arXiv:2406.02820 [cs.CV] (Published 2024-06-04)

ORACLE: Leveraging Mutual Information for Consistent Character Generation with LoRAs in Diffusion Models

Kiymet Akdemir, Pinar Yanardag

arXiv:2301.13826 [cs.CV] (Published 2023-01-31)

Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models

Hila Chefer, Yuval Alaluf, Yael Vinker, Lior Wolf, Daniel Cohen-Or

arXiv:2407.18658 [cs.CV] (Published 2024-07-26)

Adversarial Robustification via Text-to-Image Diffusion Models