NMS-Free End-to-End Detection

YOLO26 eliminates non-maximum suppression (NMS) at inference. The dual-head design produces native NMS-free end-to-end predictions. No more post-processing bottlenecks. The detection head also drops Distribution Focal Loss (DFL), which previously added significant weight and constrained regression range. The result: a lighter head with unconstrained regression, improving both speed and accuracy.

Training Innovations: MuSGD, Progressive Loss, STAL

YOLO26's training pipeline introduces three key advances:

  • MuSGD – A hybrid optimizer combining Muon and SGD, adapted from large language model training. It accelerates convergence and improves final accuracy without extra memory cost.
  • Progressive Loss – Shifts supervision toward the inference-time head during training, aligning optimization with deployment behavior.
  • STAL (Small Target Assignment Loss) – A label assignment strategy guaranteeing positive coverage for small objects. In prior YOLO versions, the smallest objects often received no positive assignment, hurting recall. STAL fixes that.

Together, these techniques allow YOLO26 to train faster and detect smaller objects more reliably.

Unified Architecture Across Tasks

YOLO26 is not just a detector. The family includes task-specific heads and losses for:

  • Instance segmentation
  • Pose estimation
  • Oriented bounding box detection
  • Image classification
  • Open-vocabulary detection (YOLOE-26)

All tasks share a common backbone and neck, with modular heads swapped per task. The five scales (n/s/m/l/x) let you trade off speed vs. accuracy. For example, YOLO26n achieves 40.9 mAP on COCO at 1.7 ms T4 TensorRT latency, while YOLO26x hits 57.5 mAP at 11.8 ms.

Open-Vocabulary Extension: YOLOE-26

YOLOE-26 extends YOLO26 for open-vocabulary detection without text, visual, or prompt inputs. It achieves 40.6 AP on LVIS minival under text prompting. This makes it suitable for zero-shot detection tasks where categories are not predefined.

Performance Benchmarks

All models were benchmarked on COCO val2017 with T4 TensorRT FP16. Latency includes preprocessing and postprocessing. Key numbers:

ScalemAPLatency (ms)
n40.91.7
s46.22.8
m50.84.5
l54.17.2
x57.511.8

Comparison with prior YOLO versions shows consistent improvements across all scales. For instance, YOLO26m outperforms YOLOv8m by 3.2 mAP at similar latency.

Code and Deployment

Code and pretrained models are available at the official repository. The package supports ONNX, TensorRT, CoreML, and TFLite export. To run inference on an image:

from ultralytics import YOLO

# Load a pretrained model
model = YOLO('yolo26n.pt')

# Run inference
results = model('image.jpg')

# Print results
results[0].show()

Training follows the same API as previous Ultralytics releases. The new optimizer MuSGD is selected via optimizer='MuSGD' in the training config.

Why This Matters for Developers

If you deploy real-time vision models on edge devices or in production, YOLO26 offers a drop-in upgrade. The removal of NMS and DFL reduces inference complexity and latency. The unified pipeline means you can switch between tasks without changing the backbone. The open-vocabulary extension opens up zero-shot use cases. For teams currently using YOLOv8 or YOLOv9, migrating to YOLO26 should yield immediate accuracy gains without code rewrites.