arXiv:2405.15881 Abstract | arXiv Analytics

arXiv:2405.15881 [cs.CV]Abstract References Reviews Resources

Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and Video Generation

Published 2024-05-24Version 1

In recent developments, the Mamba architecture, known for its selective state space approach, has shown potential in the efficient modeling of long sequences. However, its application in image generation remains underexplored. Traditional diffusion transformers (DiT), which utilize self-attention blocks, are effective but their computational complexity scales quadratically with the input length, limiting their use for high-resolution images. To address this challenge, we introduce a novel diffusion architecture, Diffusion Mamba (DiM), which foregoes traditional attention mechanisms in favor of a scalable alternative. By harnessing the inherent efficiency of the Mamba architecture, DiM achieves rapid inference times and reduced computational load, maintaining linear complexity with respect to sequence length. Our architecture not only scales effectively but also outperforms existing diffusion transformers in both image and video generation tasks. The results affirm the scalability and efficiency of DiM, establishing a new benchmark for image and video generation techniques. This work advances the field of generative models and paves the way for further applications of scalable architectures.

Categories: cs.CV, cs.AI, cs.LG

Keywords: video generation, scaling diffusion mamba, efficient image, bidirectional ssms, architecture

Related articles: Most relevant | Search more

arXiv:2104.10157 [cs.CV] (Published 2021-04-20)

VideoGPT: Video Generation using VQ-VAE and Transformers

Wilson Yan, Yunzhi Zhang, Pieter Abbeel, Aravind Srinivas

arXiv:1907.00274 [cs.CV] (Published 2019-06-29)

NetTailor: Tuning the Architecture, Not Just the Weights

Pedro Morgado, Nuno Vasconcelos

arXiv:2202.14020 [cs.CV] (Published 2022-02-28)

State-of-the-Art in the Architecture, Methods and Applications of StyleGAN

Amit H. Bermano et al.

arXiv Analytics

arXiv:2405.15881 [cs.CV]Abstract References Reviews Resources

Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and Video Generation

Links

Toolbox

arXiv:2405.15881 [cs.CV]AbstractReferencesReviewsResources

Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and Video Generation

Links

Toolbox

arXiv:2405.15881 [cs.CV]Abstract References Reviews Resources