How should we store the AI images? Google researchers propose an image compression method using score-based generative models

A year ago, generating realistic images with AI was a dream. We were impressed to see faces generated that look just like real ones, even though most outputs have three eyes, two noses, etc. However, things changed quite quickly with the release of diffusion models. Today, it is difficult to distinguish an AI-generated image from a real one.

The ability to generate high-quality images is one part of the equation. If we were to use them correctly, compressing them efficiently plays an essential role in tasks such as content generation, data storage, transmission, and bandwidth optimization. However, image compression has been predominantly based on traditional methods such as transform coding and quantization techniques, with limited exploration of generative models.

Despite their success in generating images, diffusion models and score-based generative models have not yet become mainstream approaches for image compression, lagging behind methods based on WIN. They often perform worse than or on par with GAN-based approaches like HiFiC on high-resolution images. Even attempts to reuse text-to-image models for image compression have yielded unsatisfactory results, producing reconstructions that deviate from the original input or contain unwanted artifacts.

🚀 JOIN the fastest ML subreddit community

The gap between the performance of score-based generative models on imaging tasks and their limited success in image compression raises intriguing questions and motivates further investigation. It is surprising that models capable of generating high-quality images have not been able to outperform GANs in the specific task of image compression. This discrepancy suggests that there may be unique challenges and considerations when applying score-based generative models to compression tasks, requiring specialized approaches to realize their full potential.

Thus, we know that there is a possibility of using score-based generative models in image compression. The question is, how can it be done? Let’s jump to the answer.

The Google researchers proposed a method that combines a standard autoencoder, optimized for mean square error (MSE), with a diffusion process to retrieve and add fine details discarded by the autoencoder. The bit rate for encoding an image is determined solely by the autoencoder, since the broadcast process does not require additional bits. By tuning the diffusion models specifically for image compression, it is shown that they can outperform several recent generative approaches in terms of image quality.

The proposed method can preserve details much better compared to more advanced approaches. Fountain: https://arxiv.org/pdf/2305.18231.pdf

The method explores two closely related approaches: diffusion models, which exhibit impressive performance but require a large number of sampling steps, and rectified streams, which work best when fewer sampling steps are allowed.

The two-step approach consists of first encoding the input image using the MSE-optimized autoencoder and then applying diffusion processing or rectified streams to improve the realism of the reconstruction. The diffusion model employs a noise program that travels in the opposite direction compared to text-to-image models, prioritizing detail over global structure. On the other hand, the rectified stream model takes advantage of the matching provided by the autoencoder to directly map the outputs of the autoencoder to uncompressed images.

*General description of the proposed HFD model. Fountain:* *https://arxiv.org/pdf/2305.18231.pdf*

Furthermore, the study revealed specific details that may be useful for future research in this domain. For example, the noise schedule and the amount of noise injected during imaging are shown to significantly impact the results. Interestingly, while text-to-image models benefit from higher noise levels when trained on high-resolution images, reducing the overall noise of the diffusion process is found to be advantageous for compression. This setting allows the model to focus more on fine details, since the coarse details are already adequately captured by the autoencoder reconstruction.

review the Paper. Don’t forget to join our 24k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at [email protected]

🚀 Check out 100 AI tools at AI Tools Club

Ekrem Çetinkaya received his B.Sc. in 2018, and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. She wrote her M.Sc. thesis on denoising images using deep convolutional networks. She received her Ph.D. He graduated in 2023 from the University of Klagenfurt, Austria, with his dissertation titled “Video Coding Improvements for HTTP Adaptive Streaming Using Machine Learning”. His research interests include deep learning, computer vision, video encoding, and multimedia networking.

➡️ Try: Ake – a great home proxy network (sponsored)

How should we store the AI images? Google researchers propose an image compression method using score-based generative models

Technical Terrence Team

Launch 3-2-1! The new Mimio MyBot Recruit is ready for the classroom!

Leave a Reply Cancel reply

Recommended.

Amazon's new Echo Frames can't match Ray-Ban Meta

IBM-Maersk’s blockchain effort was doomed from the start

Google's 256GB Pixel tablet is on sale at the best price yet

Tequila Don Julio debuts NFT-backed barrel finishes on BlockBar

Google DeepMind Researchers Propose WARM: A Novel Approach to Tackle Reward Hacking in Large Language Models Using Weight-Averaged Reward Models

Categories

Important Links

How should we store the AI ​​images? Google researchers propose an image compression method using score-based generative models

Related

Technical Terrence Team

Launch 3-2-1! The new Mimio MyBot Recruit is ready for the classroom!

Leave a Reply Cancel reply

Recommended.

Amazon's new Echo Frames can't match Ray-Ban Meta

IBM-Maersk’s blockchain effort was doomed from the start

Google's 256GB Pixel tablet is on sale at the best price yet

Tequila Don Julio debuts NFT-backed barrel finishes on BlockBar

Google DeepMind Researchers Propose WARM: A Novel Approach to Tackle Reward Hacking in Large Language Models Using Weight-Averaged Reward Models

Categories

Important Links

Get daily news updates to your inbox!

How should we store the AI images? Google researchers propose an image compression method using score-based generative models