Image by author
Have you ever wondered how people generate such hyper-realistic faces using ai imaging, while your own attempts end up full of glitches and artifacts that make them look obviously fake? You've tried modifying the message and settings, but you still can't seem to match the quality you see others produce. What are you doing wrong?
In this blog post, I'll walk you through 3 key techniques to start generating hyper-realistic human faces using Stable Diffusion. First, we'll cover the basics of rapid engineering to help you generate images using the base model. Next, we will explore how upgrading to the Stable Diffusion XL model can significantly improve image quality through increased parameters and training. Finally, I will present you with a custom model optimized specifically for generating high-quality portraits.
First, we will learn how to write positive and negative prompts to generate realistic faces. We will use the Stable Diffusion version 2.1 demo available on Hugging Face Spaces. It's free and you can get started without setting anything up.
Link: hf.co/spaces/stabilityai/stable-diffusion
When creating a positive message, be sure to include all the necessary details and style of the image. In this case we want to generate an image of a young woman walking down the street. We'll use a generic negative message, but you can add additional keywords to avoid repetitive errors in the image.
Positive message: “A young woman in her 20s, walking through the streets, looking directly at the camera, confident and friendly expression, dressed casually in modern stylish clothing, urban scene background, bright sunny lighting, vibrant colors”
Negative message: “disfigured, ugly, bad, immature, caricature, anime, 3d, painting, b&w, caricature, painting, illustration, worst quality, low quality”
We had a good start. The images are accurate, but the quality of the images could be better. You can play with the directions, but this is the best you'll get from the base model.
We will use the Stable Diffusion XL (SDXL) model to generate high quality images. It achieves this by generating the latent using the base mode and then processing it using a refiner to generate detailed and precise images.
Link: hf.co/spaces/hysts/SD-XL
Before generating the images, we will scroll down and open the “Advanced Options”. We will add a negative message, set the seed and apply the refiner to get the best image quality.
Then, we will write the same message as before with the minor change. Instead of a generic young woman, we will generate the image of a young Indian woman.
This is a much better result. The facial features are perfect. Let's try to generate other ethnicities to check for biases and compare the results.
We have realistic faces, but all images have Instagram filters. Normally, skin is not smoother in real life. You have acne, marks, freckles and lines.
In this part we will generate detailed faces with realistic skin and markings. To do this, we will use CivitAI's custom model (RealVisXL V2.0) that was tuned for high-quality portraits.
Link: civitai.com/models/139562/realvisxl-v20
You can use the model online by clicking the “Create” button or download it to use locally using Stable Diffusion WebUI.
First, download the model and move the file to the Stable Diffusion WebUI model directory: C:\WebUI\webui\models\Stable-diffusion.
To display the model in the WebUI, you need to press the refresh button and then select the model checkpoint “realvisxl20…”.
We'll start by writing the same positive and negative prompts and generate a high-quality 1024X1024 image.
The image looks perfect. To get the most out of the custom model, we have to change our message.
New positive and negative indications can be obtained by scrolling down the model page and clicking on the realistic image you want. CivitAI images come with positive and negative prompts and advanced direction.
Positive message: “An image of a young Indian woman, focused, decisive, surreal, dynamic pose, ultra high resolution, sharp texture, high detail RAW photo, detailed face, shallow depth of field, sharp eyes (realistic skin texture: 1.2), skin clear. , dslr, film grain”
Negative message: “(worst quality, low quality, illustration, 3d, 2d, painting, cartoons, sketch), open mouth”
We have a detailed image of an Indian woman with realistic skin. It is an improved version of the base SDXL model.
We have generated three more images to compare different ethnicities. The results are phenomenal, containing skin tags, porous skin and precise features.
The advancement of generative art will soon reach a level where we will have difficulty differentiating between real and synthetic images. This points to a sustainable future where anyone can create highly realistic media from simple text prompts by leveraging custom models trained on various real-world data. The rapid progress implies exciting potential: perhaps one day, generating a photorealistic video that replicates your own image and speech patterns could be as simple as writing a descriptive message.
In this post, we have learned about rapid engineering, advanced stable design models, and tight wardrobe models to generate highly accurate and realistic faces. If you want even better results, I suggest you explore several high-quality models available at civitai.com.
Abid Ali Awan (@1abidaliawan) is a certified professional data scientist who loves building machine learning models. Currently, he focuses on content creation and writing technical blogs on data science and machine learning technologies. Abid has a Master's degree in technology Management and a Bachelor's degree in Telecommunications Engineering. His vision is to build an artificial intelligence product using a graph neural network for students struggling with mental illness.