This Tencent AI article introduces ELLA: a machine learning method that equips current text-to-image diffusion models with next-generation large language models without LLM and U-Net training.
With diffusion models, the field of text-to-image generation has made significant progress. However, current models frequently use CLIP as a ...