Image by author
Diffusers is a Python library developed and maintained by HuggingFace. Simplifies the development and inference of diffusion models to generate images from user-defined cues. The code is openly available at GitHub with 22.4k stars in the repository. HuggingFace also maintains a wide variety of Stable DIffusion and several other diffusion models can be easily used with their library.
Installation and configuration
It's good to start with a fresh Python environment to avoid conflicts between library versions and dependencies.
To set up a new Python environment, run the following commands:
python3 -m venv venv
source venv/bin/activate
Installing the Diffusers library is simple. It is provided as an official pip package and uses the PyTorch library internally. Additionally, many broadcast models are based on the Transformers architecture, so loading a model will also require the Transformers pip package.
pip install 'diffusers(torch)' transformers
Using diffusers for ai-generated images
The broadcaster library makes it extremely easy to generate images from a message using stable broadcast models. Here, we will go through a simple code line by line to see different parts of the Diffusers library.
Imports
import torch
from diffusers import AutoPipelineForText2Image
The torch package will be necessary for general installation and configuration of the diffuser tubing. AutoPipelineForText2Image is a class that automatically identifies the model being loaded, for example, StableDiffusion1-5, StableDiffusion2.1, or SDXL, and loads the appropriate classes and modules internally. This saves us the trouble of changing the pipeline every time we want to load a new model.
Loading the models
A broadcast model is made up of several components, including but not limited to Text Encoder, UNet, Schedulers, and Variational AutoEncoder. We can load the modules separately, but the spreader library provides a build method that can load a pre-trained model given a structured checkpoint directory. For a beginner, it can be difficult to know which pipeline to use, so AutoPipeline makes it easy to load a model for a specific task.
In this example, we will load an SDXL model that is openly available on HugsFace, trained by Stability ai. The files in the directory are structured according to their names and each directory has its own security tensor file. The directory structure for the SDXL model looks like below:
To load the model in our code, we use the AutoPipelineForText2Image class and call the from_pretrained function.
pipeline = AutoPipelineForText2Image.from_pretrained(
"stability/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float32 # Float32 for CPU, Float16 for GPU,
)
We provide the model path as the first argument. This can be the name of the HuggingFace model card as shown above or a local directory where you have downloaded the model beforehand. Additionally, we define the model's weighting accuracies as a keyword argument. We normally use 32-bit floating point precision when we have to run the model on a CPU. However, running a diffusion model is computationally expensive and running an inference on a CPU device will take hours. For GPU, we use 16-bit or 32-bit data types, but 16-bit is preferable as it uses less GPU memory.
The above command will download the HuggingFace model and may take time depending on your internet connection. Model sizes can range from 1 GB to over 10 GB.
Once a model is loaded, we will need to move the model to the appropriate hardware device. Use the following code to move the model to CPU or GPU. Note that for Apple Silicon chips, move the model to an MPS device to take advantage of the GPU on MacOS devices.
# "mps" if on M1/M2 MacOS Device
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
pipeline.to(DEVICE)
Inference
We are now ready to generate images from textual cues using the loaded diffusion model. We can run an inference using the following code:
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
results = pipeline(
prompt=prompt,
height=1024,
width=1024,
num_inference_steps=20,
)
We can use the pipeline object and call it with multiple keyword arguments to control the generated images. We define a message as a string parameter that describes the image we want to generate. Additionally, we can define the height and width of the generated image, but it must be multiples of 8 or 16 due to the underlying transformer architecture. Additionally, the total inference steps can be adjusted to control the quality of the final image. More denoising steps result in higher quality images, but they take longer to generate.
Finally, the pipeline returns a list of generated images. We can access the first image in the array and manipulate it as a Pillow image to save or display the image.
img = results.images(0)
img.save('result.png')
img # To show the image in Jupyter notebook
Generated image
Advanced uses
The text-2-image example is just a basic tutorial to highlight the underlying usage of the Diffusers library. It also provides many other functionalities, including Image-2 imaging, internal painting, external painting, and control networks. Additionally, they provide precise control over each module of the diffusion model. They can be used as small building blocks that can be seamlessly integrated to create custom diffusion pipes. Additionally, they also provide additional functionality to train diffusion models on your own data sets and use cases.
Ending
In this article, we go over the basics of the Diffusers library and how to do simple inference using a Diffusion model. It is one of the most used generative ai pipelines where features and modifications are made every day. There are many different use cases and features that you can try and the HuggingFace Documentation and GitHub code is the best place to start.
Kanwal Mehreen Kanwal is a machine learning engineer and technical writer with a deep passion for data science and the intersection of ai with medicine. She is the co-author of the eBook “Maximize Productivity with ChatGPT.” As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She is also recognized as a Teradata Diversity in tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is a passionate advocate for change and founded FEMCodes to empower women in STEM fields.