Yolo (you only look once) has been a leading real -time object detection framework, with each iteration improving the previous versions. The latest Yolo V12 version presents advances that significantly improve precision while maintaining real -time processing speeds. This article explores the key innovations in Yolo V12, highlighting how it exceeds the previous versions while minimizing computational costs without compromising detection efficiency.
What is new in Yolo V12?
Previously, the Yolo models were based on convolutional neuronal networks (CNN) for the detection of objects due to their speed and efficiency. However, Yolo V12 uses care mechanisms, a widely known and used concept in transformer models that allow it to recognize patterns more effectively. Although the care mechanisms have been originally slow for the detection of objects in real time, Yolo V12 somehow integrates them successfully while maintaining the speed of Yolo, which leads to a Yolo frame focused on attention.
Key improvements on previous versions
1. Frame centered on attention
Yolo V12 combines the power of care mechanisms with the CNN, resulting in a model that is both faster and more precise. Unlike their predecessors that were based solely on CNNS, Yolo V12 introduces optimized care modules to improve the recognition of objects without adding unnecessary latency.
2. Higher performance metrics
Compare the performance metrics in different versions of YOLO and real -time detection models reveals that Yolo V12 achieves greater precision while maintaining low latency.
- He map (average precision) The values in data sets such as Coco show Yolo V12 overcoming Yolo V11 and Yolo V10 while maintaining a comparable speed.
- The model achieves a notable 40.6% precision (map) while processes images in only 1.64 milliseconds In a NVIDIA T4 GPU. This action is superior to Yolo V10 and Yolo V11 without sacrificing speed.
3. Supervision of models that are not yolo
Yolo V12 exceeds the previous versions of Yolo; It also exceeds other real-time object detection frames, such as RT-DET and RT-DET V2. These alternative models have a higher latency, but they cannot coincide with the precision of Yolo V12.
Computer efficiency improvements
One of the main concerns with the integration of care mechanisms in Yolo models was its high computational cost (attention mechanism) and memory inefficiency. Yolo V12 addresses these problems through several key innovations:
1. Flash Attention to memory efficiency
Traditional care mechanisms consume a large amount of memory, making them little practical for real -time applications. Yolo V12 presents AttentionA technique that reduces memory consumption and accelerates inference time.
2. Area Attention for a lower calculation cost
To further optimize efficiency, Yolo V12 uses Area carewhich focuses only on relevant regions of an image instead of processing the entire map of characteristics. This technique drastically reduces calculation costs while retaining precision.
3. R-ELAN for optimized features processing
Yolo V12 also presents R-Elan (Elan de Reingüe)which optimizes the propagation of characteristics, which makes the model more efficient in the management of complex object detection tasks without increasing computational demands.
YOLO V12 model variants
Yolo V12 comes in five different variants, which serve different applications:
- N (nano) & s (small): Designed for real -time applications where the speed is crucial.
- M (medium): Precision and speed balances, suitable for general use tasks.
- L (large) and XL (extra large): Optimized for high precision tasks where precision is prioritized over speed.
Also read:
Let's compare the Yolo V11 and Yolo V12 models
We will experiment with small models Yolo V11 and Yolo V12 to understand their performance in several tasks such as object counting, heat maps and speed estimation.
1. Cash of objects
Yolo V11
import cv2
from ultralytics import solutions
cap = cv2.VideoCapture("highway.mp4")
assert cap.isOpened(), "Error reading video file"
w, h, fps = (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)), int(cap.get(cv2.CAP_PROP_FPS)))
# Define region points
region_points = ((20, 1500), (1080, 1500), (1080, 1460), (20, 1460)) # Lower rectangle region counting
# Video writer (MP4 format)
video_writer = cv2.VideoWriter("object_counting_output.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))
# Init ObjectCounter
counter = solutions.ObjectCounter(
show=False, # Disable internal window display
region=region_points,
model="yolo11s.pt",
)
# Process video
while cap.isOpened():
success, im0 = cap.read()
if not success:
print("Video frame is empty or video processing has been successfully completed.")
break
im0 = counter.count(im0)
# Resize to fit screen (optional — scale down for large videos)
im0_resized = cv2.resize(im0, (640, 360)) # Adjust resolution as needed
# Show the resized frame
cv2.imshow("Object Counting", im0_resized)
video_writer.write(im0)
# Press 'q' to exit
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
video_writer.release()
cv2.destroyAllWindows()
Production
YOLO V12
import cv2
from ultralytics import solutions
cap = cv2.VideoCapture("highway.mp4")
assert cap.isOpened(), "Error reading video file"
w, h, fps = (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)), int(cap.get(cv2.CAP_PROP_FPS)))
# Define region points
region_points = ((20, 1500), (1080, 1500), (1080, 1460), (20, 1460)) # Lower rectangle region counting
# Video writer (MP4 format)
video_writer = cv2.VideoWriter("object_counting_output.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))
# Init ObjectCounter
counter = solutions.ObjectCounter(
show=False, # Disable internal window display
region=region_points,
model="yolo12s.pt",
)
# Process video
while cap.isOpened():
success, im0 = cap.read()
if not success:
print("Video frame is empty or video processing has been successfully completed.")
break
im0 = counter.count(im0)
# Resize to fit screen (optional — scale down for large videos)
im0_resized = cv2.resize(im0, (640, 360)) # Adjust resolution as needed
# Show the resized frame
cv2.imshow("Object Counting", im0_resized)
video_writer.write(im0)
# Press 'q' to exit
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
video_writer.release()
cv2.destroyAllWindows()
Production
2. Heathmaps
Yolo V11
import cv2
from ultralytics import solutions
cap = cv2.VideoCapture("mall_arial.mp4")
assert cap.isOpened(), "Error reading video file"
w, h, fps = (int(cap.get(x)) for x in (cv2.CAP_PROP_FRAME_WIDTH, cv2.CAP_PROP_FRAME_HEIGHT, cv2.CAP_PROP_FPS))
# Video writer
video_writer = cv2.VideoWriter("heatmap_output_yolov11.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))
# In case you want to apply object counting + heatmaps, you can pass region points.
# region_points = ((20, 400), (1080, 400)) # Define line points
# region_points = ((20, 400), (1080, 400), (1080, 360), (20, 360)) # Define region points
# region_points = ((20, 400), (1080, 400), (1080, 360), (20, 360), (20, 400)) # Define polygon points
# Init heatmap
heatmap = solutions.Heatmap(
show=True, # Display the output
model="yolo11s.pt", # Path to the YOLO11 model file
colormap=cv2.COLORMAP_PARULA, # Colormap of heatmap
# region=region_points, # If you want to do object counting with heatmaps, you can pass region_points
# classes=(0, 2), # If you want to generate heatmap for specific classes i.e person and car.
# show_in=True, # Display in counts
# show_out=True, # Display out counts
# line_width=2, # Adjust the line width for bounding boxes and text display
)
# Process video
while cap.isOpened():
success, im0 = cap.read()
if not success:
print("Video frame is empty or video processing has been successfully completed.")
break
im0 = heatmap.generate_heatmap(im0)
im0_resized = cv2.resize(im0, (w, h))
video_writer.write(im0_resized)
cap.release()
video_writer.release()
cv2.destroyAllWindows()
Production
YOLO V12
import cv2
from ultralytics import solutions
cap = cv2.VideoCapture("mall_arial.mp4")
assert cap.isOpened(), "Error reading video file"
w, h, fps = (int(cap.get(x)) for x in (cv2.CAP_PROP_FRAME_WIDTH, cv2.CAP_PROP_FRAME_HEIGHT, cv2.CAP_PROP_FPS))
# Video writer
video_writer = cv2.VideoWriter("heatmap_output_yolov12.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))
# In case you want to apply object counting + heatmaps, you can pass region points.
# region_points = ((20, 400), (1080, 400)) # Define line points
# region_points = ((20, 400), (1080, 400), (1080, 360), (20, 360)) # Define region points
# region_points = ((20, 400), (1080, 400), (1080, 360), (20, 360), (20, 400)) # Define polygon points
# Init heatmap
heatmap = solutions.Heatmap(
show=True, # Display the output
model="yolo12s.pt", # Path to the YOLO11 model file
colormap=cv2.COLORMAP_PARULA, # Colormap of heatmap
# region=region_points, # If you want to do object counting with heatmaps, you can pass region_points
# classes=(0, 2), # If you want to generate heatmap for specific classes i.e person and car.
# show_in=True, # Display in counts
# show_out=True, # Display out counts
# line_width=2, # Adjust the line width for bounding boxes and text display
)
# Process video
while cap.isOpened():
success, im0 = cap.read()
if not success:
print("Video frame is empty or video processing has been successfully completed.")
break
im0 = heatmap.generate_heatmap(im0)
im0_resized = cv2.resize(im0, (w, h))
video_writer.write(im0_resized)
cap.release()
video_writer.release()
cv2.destroyAllWindows()
Production
3. Speed estimation
Yolo V11
import cv2
from ultralytics import solutions
import numpy as np
cap = cv2.VideoCapture("cars_on_road.mp4")
assert cap.isOpened(), "Error reading video file"
# Capture video properties
w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(cap.get(cv2.CAP_PROP_FPS))
# Video writer
video_writer = cv2.VideoWriter("speed_management_yolov11.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))
# Define speed region points (adjust for your video resolution)
speed_region = ((300, h - 200), (w - 100, h - 200), (w - 100, h - 270), (300, h - 270))
# Initialize SpeedEstimator
speed = solutions.SpeedEstimator(
show=False, # Disable internal window display
model="yolo11s.pt", # Path to the YOLO model file
region=speed_region, # Pass region points
# classes=(0, 2), # Optional: Filter specific object classes (e.g., cars, trucks)
# line_width=2, # Optional: Adjust the line width
)
# Process video
while cap.isOpened():
success, im0 = cap.read()
if not success:
print("Video frame is empty or video processing has been successfully completed.")
break
# Estimate speed and draw bounding boxes
out = speed.estimate_speed(im0)
# Draw the speed region on the frame
cv2.polylines(out, (np.array(speed_region)), isClosed=True, color=(0, 255, 0), thickness=2)
# Resize the frame to fit the screen
im0_resized = cv2.resize(out, (1280, 720)) # Resize for better screen fit
# Show the resized frame
cv2.imshow("Speed Estimation", im0_resized)
video_writer.write(out)
# Press 'q' to exit
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
video_writer.release()
cv2.destroyAllWindows()
Production
YOLO V12
import cv2
from ultralytics import solutions
import numpy as np
cap = cv2.VideoCapture("cars_on_road.mp4")
assert cap.isOpened(), "Error reading video file"
# Capture video properties
w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(cap.get(cv2.CAP_PROP_FPS))
# Video writer
video_writer = cv2.VideoWriter("speed_management_yolov12.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))
# Define speed region points (adjust for your video resolution)
speed_region = ((300, h - 200), (w - 100, h - 200), (w - 100, h - 270), (300, h - 270))
# Initialize SpeedEstimator
speed = solutions.SpeedEstimator(
show=False, # Disable internal window display
model="yolo12s.pt", # Path to the YOLO model file
region=speed_region, # Pass region points
# classes=(0, 2), # Optional: Filter specific object classes (e.g., cars, trucks)
# line_width=2, # Optional: Adjust the line width
)
# Process video
while cap.isOpened():
success, im0 = cap.read()
if not success:
print("Video frame is empty or video processing has been successfully completed.")
break
# Estimate speed and draw bounding boxes
out = speed.estimate_speed(im0)
# Draw the speed region on the frame
cv2.polylines(out, (np.array(speed_region)), isClosed=True, color=(0, 255, 0), thickness=2)
# Resize the frame to fit the screen
im0_resized = cv2.resize(out, (1280, 720)) # Resize for better screen fit
# Show the resized frame
cv2.imshow("Speed Estimation", im0_resized)
video_writer.write(out)
# Press 'q' to exit
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
video_writer.release()
cv2.destroyAllWindows()
Production
Also read: Top 30+ computer vision models for 2025
Expert opinions about Yolov11 and Yolov12
Muhammad Rizwan Munawar – Computer Vision Engineer in Ultraalitic
“Yolov12 introduces flash attention, which improves precision, but requires a careful Cuda configuration. It is a strong step forward, especially for complex detection tasks, although Yolov11 remains faster for real -time needs. In short, choose Yolov12 for precision and Yolov11 for speed. “
LinkedIn publication -And is it really an avant -garde model?
Muhammad Rizwan, recently tested Yolov11 and Yolov12 next to the other to break his real world performance. His findings highlight compensation between the two models:
- Marcos per second (FPS): Yolov11 maintains an average of 40 fpsWhile Yolov12 is left behind in 30 fps. This makes Yolov11 the best option for real -time applications where speed is critical, such as traffic monitoring or food in live videos.
- Training time: Yolov12 Take 20% more to train than Yolov11. In a small set of data with 130 training images and 43 validation imagesYolov11 completed training in 0.009 hourswhile Yolov12 needed 0.011 hours. While this may seem less for small data sets, the difference becomes significant for large -scale projects.
- Accuracy: Both models achieved similar precision After adjusting for 10 times in the same data set. Yolov12 did not dramatically exceed Yolov11 in terms of precision, which suggests that the improvements of the newest model lie more in architectural improvements than the precision of unprocessed detection.
- Attention: Yolov12 presents attentionA powerful mechanism that accelerates and optimizes attention layers. However, there is a capture: this feature is not compatible natively in the CPU, and enable it with CUDA requires a specific careful configuration of the version. For powerful equipment without powerful or for those who work on devices on the edge, this can become an obstacle.
PC specifications used for tests:
- GPU: NVIDIA RTX 3050
- CPU: Intel Core-I5-10400 @2.90GHz
- RAM: 64 GB
Model specifications:
- Model = yolo11n.pt and yolov12n.pt
- Image size = 640 for inference
Conclusion
Yolo V12 marks a significant leap forward in the detection of objects in real time, combining the speed of CNN with care mechanisms similar to the transformer. With improved precision, lower computational costs and a range of model variants, Yolo V12 is ready to redefine the panorama of real -time vision applications. Either for autonomous vehicles, security surveillance or medical images, Yolo V12 establishes a new standard for real -time object detection efficiency.
What follows?
- YOLO V13 Possibilities: Will future versions further push the care mechanisms?
- Edge device optimization: Can the attention of the area or attention of the area be optimized for lower power devices?
To help you better understand the differences, I have attached some code fragments and output results in the comparison section. These examples illustrate how Yolov11 and Yolov12 perform in real world scenarios, from object count to speed estimation and heat maps. I am excited to see how you perceive this new release! Are improvements in precision and attention mechanisms to justify speed compensation? Or do you think Yolov11 is still firm for most applications?