Stability.ai has introduced Stable Diffusion 3.5, with multiple variants: Stable Diffusion 3.5 Large, Large Turbo and Medium. These models are customizable and can run on commodity hardware. Let's explore these models, learn how to access them, and use them as inference to see what stable diffusion brings to the table this time.
Overview
- Availability: Models can be downloaded from Hugging Face. Accessible through various platforms such as Stability ai API, Replicate and others.
- Safety and security: Stability ai has implemented security protocols designed to minimize potential misuse. These measures guarantee responsible use and user safety.
- Future improvements: Plans include ControlNet support, allowing for more advanced and precise control over the imaging process.
- Platform flexibility: Users can access and integrate these models into their workflows across different platforms, providing flexibility of use.
Stable diffusion models 3.5
Stable Diffusion 3.5 offers a range of models:
- Stable Diffusion 3.5 Large: With 8.1 billion parameters, this flagship model offers top-notch quality and fast adhesion, making it the most powerful in the stable diffusion lineup. It is optimized for professional applications with a resolution of 1 megapixel.
- Stable Diffusion 3.5 Large Turbo: This model, an optimized version of the Stable Diffusion 3.5 Large, produces high-quality images with excellent fast adhesion in just 4 steps, offering significantly faster performance than the standard large model.
- Stable diffusion 3.5 Medium: With 2.5 billion parameters and the improved MMDiT-x architecture, this model is designed for perfect use in consumer hardware. It balances quality with customization flexibility and supports imaging with resolution from 0.25 to 2 megapixels.
Models can be easily tuned to meet needs and are optimized for consumer hardware, including the Stable Diffusion 3.5 Medium and Large Turbo models, which deliver high-quality results with minimal resource demands. The 3.5 Medium model requires 9.9 GB of VRAM (not including text encoders), ensuring broad compatibility with most GPUs.
Comparison with other models
Stable Diffusion 3.5 Large leads the way in fast adhesion and rivals larger models in image quality. The Large Turbo variant offers fast inferences and quality results, while the 3.5 Medium offers an efficient and high-performance option among the medium-sized models.
Accessing Stable Diffusion 3.5
<h3 class="wp-block-heading" id="h-on-stability-ai-platform”>On the Stability.ai platform
Go toai/account/keys” target=”_blank” rel=”noreferrer noopener nofollow”> platform page and get your API key. (You are offered 25 credits after registering)
Run this Python code in a jupyter environment (replace your API key in the code) to generate an image and change the message if you want.
import requests
response = requests.post(
f"https://api.stability.ai/v2beta/stable-image/generate/sd3",
headers={
"authorization": f"Bearer sk-{API-key}",
"accept": "image/*"
},
files={"none": ''},
data={
"prompt": "A middle-aged man wearing formal clothes",
"output_format": "jpeg",
},
)
if response.status_code == 200:
with open("./man.jpeg", 'wb') as file:
file.write(response.content)
else:
raise Exception(str(response.json()))
I asked the model to generate an image of “A middle-aged man dressed in formal clothes”, and the model seems to perform well in generating photorealistic images.
In the embraced face
You can use the model in Hugging Face.
Firstclick on the linkand then you can start inferring directly from the Stable Diffusion 3.5 mid-model.
This is the interface with which you will be received:
I asked the model to generate an image of “A Forest with Red Trees” and it did a wonderful job generating this 1024 x 1024 image.
Feel free to play with the advanced settings to see how the result changes.
Using the inference API in Huggingface:
Step 1: Visit the model page Stable diffusion 3.5 large on hugged face
Note: You can choose a different model and see the options here: hugging face.
Step 2: Fill in the necessary details to access the model, as it is a closed model, and wait a moment. Once you have been granted access, you will be able to use the model.
Step 3: You can now run this Python code in a jupyter environment to send messages to the model. (make sure to replace your Hugging Face token in the header)
import requests
API_URL = "https://api-inference.huggingface.co/models/stabilityai/stable-diffusion-3.5-large"
headers = {"Authorization": "Bearer hf_token"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.content
image_bytes = query({
"inputs": "A ninja sitting on top of a tall building, 8k",
})
# You can access the image with PIL
import io
from PIL import Image
image = Image.open(io.BytesIO(image_bytes))
image
You can change the message and try to generate different types of images.
Conclusion
In conclusion, the model offers a solid range of imaging models with various levels of performance tailored for both professional and consumer use. The line, which includes Large, Large Turbo and Medium models, provides flexibility in quality and speed, making it an excellent choice for a variety of applications. With simple access options through the Stability ai platform, Hugging Face, and API integrations, Stable Diffusion 3.5 makes it easy to generate high-quality ai-powered images.
Also, if you are looking for a generative ai course, explore: GenAI Pinnacle Program
Frequently asked questions
Answer. API requests require an API key for authentication, which must be included in the header to access various functionality.
Answer. Common errors include unauthorized access, invalid parameters, or exceeding usage limits, each with specific response codes for troubleshooting.
Answer. The model is free under the Stability Community License for research, non-commercial use, and organizations with revenues less than $1 million. Larger entities require a business license.
Answer. It uses a multi-modal diffusion transformer (MMDiT-x) with enhanced training techniques, such as QK normalization and dual attention, for improved multi-resolution image generation.