Have you ever admired how smartphone cameras isolate the main subject from the background, adding a subtle blur to the background depending on depth? This “portrait mode” effect gives photos a professional look by simulating a shallow depth of field similar to DSLR cameras. In this tutorial, we will recreate this effect programmatically using open source computer vision models such as Meta's SAM2 and Intel ISL's MiDaS.
To build our pipeline, we will use:
- Segmentation of Anything Model (SAM2): To segment objects of interest and separate the foreground from the background.
- Depth estimation model: To calculate a depth map, enabling depth-based blurring.
- Gaussian blur: To blur the background with an intensity that varies depending on the depth.
Step 1: Set up the environment
To get started, install the following dependencies:
pip install matplotlib samv2 pytest opencv-python timm pillow
Step 2 – Upload a Target Image
Choose an image to apply this effect and load it into Python using the Pillow library.
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
image_path = ".jpg"
img = Image.open(image_path)
img_array = np.array(img)
# Display the image
Step 3: Initialize SAM2
To initialize the model, download the pre-trained checkpoint. SAM2 offers four variants based on performance and inference speed: tiny, small, base_plus, and large. In this tutorial, we will use tiny for faster inference.
Download the checkpoint model from: https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_.pt
Replace with the type of model you want.
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
from sam2.utils.misc import variant_to_config_mapping
from sam2.utils.visualization import show_masks
model = build_sam2(
image_predictor = SAM2ImagePredictor(model)
Step 4: Enter the image in SAM and select the theme
Set the image to SAM and provide points that are on the subject you want to isolate. SAM predicts a binary mask of the subject and background.
input_point = np.array(((2500, 1200), (2500, 1500), (2500, 2000)))
input_label = np.array((1, 1, 1))
masks, scores, logits = image_predictor.predict(
output_mask = show_masks(img_array, masks, scores)
sorted_ind = np.argsort(scores)(::-1)
Step 5: Initialize the depth estimation model
For depth estimation, we use Midas by Intel ISL. Like SAM, you can choose different variants based on accuracy and speed.Note: The predicted depth map is inverted, meaning that larger values correspond to closer objects. We will reverse it in the next step for better intuition.
import torch
import torchvision.transforms as transforms
model_type = "DPT_Large" # MiDaS v3 - Large (highest accuracy)
# Load MiDaS model
model = torch.hub.load("intel-isl/MiDaS", model_type)
# Load and preprocess image
transform = torch.hub.load("intel-isl/MiDaS", "transforms").dpt_transform
input_batch = transform(img_array)
# Perform depth estimation
with torch.no_grad():
prediction = model(input_batch)
prediction = torch.nn.functional.interpolate(
prediction = prediction.cpu().numpy()
# Visualize the depth map
plt.imshow(prediction, cmap="plasma")
plt.colorbar(label="Relative Depth")
plt.title("Depth Map Visualization")
Step 6: Apply Depth-Based Gaussian Blur
Here we optimize depth-based blurring using an iterative Gaussian blurring approach. Instead of applying a single large kernel, we apply a smaller kernel multiple times for pixels with higher depth values.
import cv2
def apply_depth_based_blur_iterative(image, depth_map, base_kernel_size=7, max_repeats=10):
if base_kernel_size % 2 == 0:
base_kernel_size += 1
# Invert depth map
depth_map = np.max(depth_map) - depth_map
# Normalize depth to range (0, max_repeats)
depth_normalized = cv2.normalize(depth_map, None, 0, max_repeats, cv2.NORM_MINMAX).astype(np.uint8)
blurred_image = image.copy()
for repeat in range(1, max_repeats + 1):
mask = (depth_normalized == repeat)
if np.any(mask):
blurred_temp = cv2.GaussianBlur(blurred_image, (base_kernel_size, base_kernel_size), 0)
for c in range(image.shape(2)):
blurred_image(..., c)(mask) = blurred_temp(..., c)(mask)
return blurred_image
blurred_image = apply_depth_based_blur_iterative(img_array, prediction, base_kernel_size=35, max_repeats=20)
# Visualize the result
plt.figure(figsize=(20, 10))
plt.subplot(1, 2, 1)
plt.title("Original Image")
plt.subplot(1, 2, 2)
plt.title("Depth-based Blurred Image")
Step 7 – Combine Foreground and Background
Finally, use the SAM mask to extract the sharp foreground and blend it with the blurred background.
def combine_foreground_background(foreground, background, mask):
if mask.ndim == 2:
mask = np.expand_dims(mask, axis=-1)
return np.where(mask, foreground, background)
mask = masks(sorted_ind(0)).astype(np.uint8)
mask = cv2.resize(mask, (img_array.shape(1), img_array.shape(0)))
foreground = img_array
background = blurred_image
combined_image = combine_foreground_background(foreground, background, mask)
plt.figure(figsize=(20, 10))
plt.subplot(1, 2, 1)
plt.title("Original Image")
plt.subplot(1, 2, 2)
plt.title("Final Portrait Mode Effect")
With just a few tools, we have recreated the portrait mode effect programmatically. This technique can be extended for photo editing applications, simulating camera effects, or creative projects.
Future improvements:
- Use edge detection algorithms to better refine subject edges.
- Experiment with the size of the grains to improve the blur effect.
- Create a user interface to load images and select themes dynamically.
- Segment any model by META (https://github.com/facebookresearch/sam2)
- CPU-friendly implementation of SAM 2 (https://github.com/SauravMaheshkar/samv2/tree/main)
- MIDas depth estimation model (https://pytorch.org/hub/intelisl_midas_v2/)
Vineet Kumar is a Consulting Intern at MarktechPost. He is currently pursuing his bachelor's degree from the Indian Institute of technology (IIT), Kanpur. He is a machine learning enthusiast. He is passionate about research and the latest advances in Deep Learning, Computer Vision and related fields.