Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling

Diffusion models have become a powerful tool for generating high-quality images from textual descriptions. Despite their successes, these models often exhibit limited diversity in the sampled images, particularly when sampling with a high guide weight without a classifier. To address this issue, we present Kaleido, a novel approach that improves sample diversity by incorporating autoregressive latent priors. Kaleido integrates an autoregressive language model that encodes the original title and generates latent variables, serving as abstract and intermediary representations to guide and facilitate the image generation process. In this article, we explore a variety of discrete latent representations, including textual descriptions, detection bounding boxes, object blobs, and visual tokens. These representations diversify and enrich the entry conditions to diffusion models, allowing for more diverse results. Our experimental results demonstrate that Kaleido effectively expands the diversity of image samples generated from a given textual description while maintaining high image quality. Furthermore, we show that Kaleido closely adheres to the guidance provided by the generated latent variables, demonstrating its ability to effectively control and direct the image generation process.

Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling

Technical Terrence Team

A 13% return? Here's the 3-year dividend forecast for higher earnings

Leave a Reply Cancel reply

Recommended.

Fund management giant VanEck says Bitcoin is ‘close to perfect trading’

Arrested Bitzlato Exchange Founder Seeks Help From Crypto Community – Exchanges Bitcoin News

How to leverage microcredentials and LERs in primary and secondary education

Amazon is selling an 'awesome' $350 indoor pizza oven for just $185, and shoppers say it's a 'game changer'

The TII-UAE Technology Innovation Institute has just launched Falcon 3: a family of open source AI models with 30 new model checkpoints from 1B to 10B

Categories

Important Links

Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling

Related

Technical Terrence Team

A 13% return? Here's the 3-year dividend forecast for higher earnings

Leave a Reply Cancel reply

Recommended.

Fund management giant VanEck says Bitcoin is ‘close to perfect trading’

Arrested Bitzlato Exchange Founder Seeks Help From Crypto Community – Exchanges Bitcoin News

How to leverage microcredentials and LERs in primary and secondary education

Amazon is selling an 'awesome' $350 indoor pizza oven for just $185, and shoppers say it's a 'game changer'

The TII-UAE Technology Innovation Institute has just launched Falcon 3: a family of open source AI models with 30 new model checkpoints from 1B to 10B

Categories

Important Links

Get daily news updates to your inbox!