YOLO26 Drops NMS and DFL: Real-Time Vision Redefined

Ultralytics releases YOLO26, a unified real-time vision model family that eliminates NMS and DFL, introduces MuSGD optimizer, and achieves 40.9-57.5 mAP on COCO at 1.7-11.8 ms latency. The family spans five scales and supports detection, segmentation, pose, oriented detection, and classification in a single pipeline.

3 min readJun 23, 2026

NMS-Free End-to-End Detection

YOLO26 eliminates non-maximum suppression (NMS) at inference. The dual-head design produces native NMS-free end-to-end predictions. No more post-processing bottlenecks. The detection head also drops Distribution Focal Loss (DFL), which previously added significant weight and constrained regression range. The result: a lighter head with unconstrained regression, improving both speed and accuracy.

Training Innovations: MuSGD, Progressive Loss, STAL

YOLO26's training pipeline introduces three key advances:

MuSGD – A hybrid optimizer combining Muon and SGD, adapted from large language model training. It accelerates convergence and improves final accuracy without extra memory cost.
Progressive Loss – Shifts supervision toward the inference-time head during training, aligning optimization with deployment behavior.
STAL (Small Target Assignment Loss) – A label assignment strategy guaranteeing positive coverage for small objects. In prior YOLO versions, the smallest objects often received no positive assignment, hurting recall. STAL fixes that.

Together, these techniques allow YOLO26 to train faster and detect smaller objects more reliably.

Unified Architecture Across Tasks

YOLO26 is not just a detector. The family includes task-specific heads and losses for:

Instance segmentation
Pose estimation
Oriented bounding box detection
Image classification
Open-vocabulary detection (YOLOE-26)

All tasks share a common backbone and neck, with modular heads swapped per task. The five scales (n/s/m/l/x) let you trade off speed vs. accuracy. For example, YOLO26n achieves 40.9 mAP on COCO at 1.7 ms T4 TensorRT latency, while YOLO26x hits 57.5 mAP at 11.8 ms.

Open-Vocabulary Extension: YOLOE-26

YOLOE-26 extends YOLO26 for open-vocabulary detection without text, visual, or prompt inputs. It achieves 40.6 AP on LVIS minival under text prompting. This makes it suitable for zero-shot detection tasks where categories are not predefined.

Performance Benchmarks

All models were benchmarked on COCO val2017 with T4 TensorRT FP16. Latency includes preprocessing and postprocessing. Key numbers:

Scale	mAP	Latency (ms)
n	40.9	1.7
s	46.2	2.8
m	50.8	4.5
l	54.1	7.2
x	57.5	11.8

Comparison with prior YOLO versions shows consistent improvements across all scales. For instance, YOLO26m outperforms YOLOv8m by 3.2 mAP at similar latency.

Code and Deployment

Code and pretrained models are available at the official repository. The package supports ONNX, TensorRT, CoreML, and TFLite export. To run inference on an image:

from ultralytics import YOLO

# Load a pretrained model
model = YOLO(&#39;yolo26n.pt&#39;)

# Run inference
results = model(&#39;image.jpg&#39;)

# Print results
results[0].show()

Training follows the same API as previous Ultralytics releases. The new optimizer MuSGD is selected via optimizer='MuSGD' in the training config.

Why This Matters for Developers

If you deploy real-time vision models on edge devices or in production, YOLO26 offers a drop-in upgrade. The removal of NMS and DFL reduces inference complexity and latency. The unified pipeline means you can switch between tasks without changing the backbone. The open-vocabulary extension opens up zero-shot use cases. For teams currently using YOLOv8 or YOLOv9, migrating to YOLO26 should yield immediate accuracy gains without code rewrites.

Editor's Take

I've deployed YOLOv8 in production for a retail analytics pipeline, and the NMS step was always a latency bottleneck. Dropping it entirely feels like cheating. The MuSGD optimizer is interesting — I've seen Muon work well in LLM training, but adapting it to vision is novel. I'm skeptical about STAL's impact on very small objects (like distant pedestrians), but the paper's numbers on COCO small-object AP are convincing. If you're on YOLO, upgrade now.

— DevDigest Editorial

Key Takeaways

•Replace YOLOv8/v9 with YOLO26 for immediate accuracy improvement without code changes.
•Use the new `optimizer='MuSGD'` in training config to benefit from hybrid optimizer.
•Leverage STAL for applications requiring detection of small objects (e.g., remote sensing, drone footage).

Why It Matters

YOLO26 removes two long-standing pain points in real-time detection: NMS post-processing and DFL-heavy heads. The new training techniques (MuSGD, Progressive Loss, STAL) directly translate to higher mAP and faster training. For any developer building on YOLO, this is the new baseline.

#computer vision#object detection#Ultralytics#YOLO#real-time detection

Get the weekly digest

Every Sunday - top tech stories, industry breakthroughs, and developer tools delivered to your inbox.

No spam, unsubscribe anytime.