Imaging has come a long way in the last year. The saga began with the release of Stable Diffusion, and its success has drawn the attention of researchers from different domains to move it further. It is now possible to generate photorealistic images or even videos using diffusion models. What can we say? Diffusion models have become the de facto solution in the generative AI domain in just a couple of months.
Diffusion models have two strong features that make them the go-to solution for a generation; the ability to capture complex distributions and stability in training. Unlike other types of generative models, such as GANs, diffusion models do not require a discriminatory network to be trained in tandem. This simplifies the training process and makes the model less likely to experience problems such as mode collapse, where the model generates only a limited set of outputs.
However, not everything is rosy and rosy. Diffusion models have a big problem, and it’s causing a lot of people just can’t afford to use them; its extremely slow and expensive formation process. These models require really large data sets to work well; we are talking about billions of images. Therefore, training a diffusion model from scratch is simply not feasible for most people.
What if there was another way? What if we could make diffusion models train more efficiently? What if we could lower the extremely high cost of training so they could be more affordable? time to meet patch diffusion.
patch diffusion it is a plug-and-play training technique that is independent of any choice of UNet architecture, sampler, noise program, etc. The method proposes to learn a conditional scoring function on image patches, where both the location of the patch in the original image and the size of the patch are the conditions. By training on patches instead of full images, the computational load per iteration is significantly reduced.
To incorporate the conditions for the patch locations, a pixel-level coordinate system is constructed and the patch location information is encoded as additional coordinate channels. These channels are then concatenated with the original image channels as input to the diffusion models. Furthermore, Patch Diffusion proposes to diversify patch sizes in a progressive or stochastic schedule throughout training to capture cross-region dependency at multiple scales.
Patch Diffusion can generate realistic images. Fountain: https://arxiv.org/abs/2304.12526
The results show that Patch Diffusion could at least double the training speed while maintaining comparable or better generation quality. Furthermore, the method improves the performance of diffusion models trained on relatively small data sets. So using it to train your own broadcast model for a specific use case is a feasible option now.
review the Paper. Don’t forget to join our 21k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at [email protected]
🚀 Check out 100 AI tools at AI Tools Club
Ekrem Çetinkaya received his B.Sc. in 2018 and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. She wrote her M.Sc. thesis on denoising images using deep convolutional networks. She is currently pursuing a PhD. She graduated from the University of Klagenfurt, Austria, and working as a researcher in the ATHENA project. Her research interests include deep learning, computer vision, and multimedia networks.