Magenta RealTime 2 Ships Open-Weight Live Music Model with 40ms Frame Latency
Google's Magenta team has released Magenta RealTime 2 (MRT2), an open-weights model and inference engine for real-time AI music generation. The 2.4B-parameter model runs on Apple Silicon MacBooks, achieving a 40ms frame size and ~200ms control latency — a 15x improvement over the first version.
Unlike offline generative music models that process a prompt into a static track, MRT2 is a live, interactive instrument. It accepts MIDI, text, and audio inputs, and generates audio continuously with low latency. The model is released under an open license, along with a C++ inference engine, a Python library, and example applications.
Architecture: Frame-Level Autoregression with Sliding Window Attention
MRT2 is a codec language model using the SpectroStream codec to compress 48kHz stereo audio into tokens at 3 kbps (25 Hz frame rate, 12 residual vector quantization tokens per frame, vocabulary size 1024). The key architectural change from the original Magenta RealTime is moving from chunk-level to frame-level autoregression.
Original MRT processed 2-second chunks (400 tokens) at a time, creating a minimum 2-second control delay. MRT2 processes individual frames (12 tokens, 40ms) using a decoder-only Transformer with causal sliding window attention. This reduces the sequential bottleneck: conditioning (MIDI, text, audio) is injected as frame-aligned conditioning at every step, allowing the model to react within a single frame.
To handle long sequences with bounded memory, the sliding window attention evicts old key-value cache entries beyond a fixed window size. The team added learnable attention sink embeddings to prevent quality degradation when initial tokens are evicted, and dropped positional embeddings (NoPE) to improve length generalization — they found RoPE hurt performance beyond training length.
Inference Engine: C++ with MLX on Apple Silicon
The inference engine is written in C++ and uses Apple's MLX framework to run on Apple Silicon GPUs. The model is implemented in Python using the SequenceLayers library, then compiled into an .mlxfn file (bundling weights and computational graph). The C++ engine loads this file and executes it via the MLX runtime, handling audio buffering, resampling, and MIDI input.
Real-time performance (generating audio faster than playback) requires specific hardware:
- Base model (2.4B): MacBook M3 Pro or higher, or M2 Max or higher
- Small model (230M): Any Apple Silicon Mac, including MacBook Air
Both models can run offline (non-real-time) on any Apple Silicon Mac.
Example Applications and Integrations
The release includes a suite of example applications: standalone apps, DAW plugins, and extensions. These demonstrate sound cloning, style blending, and live accompaniment. The Python library (pip install magenta-rt) provides inference via JAX/MLX using SequenceLayers.
How to Get Started
- Download the apps from the Magenta website (requires Apple Silicon Mac).
- Install the Python library:
pip install magenta-rt - Use the C++ inference engine for DAW integration or custom instrument development.
The team plans to add finetuning support and more performance tools, and will be at the Music Technology Hackathon in Boston showcasing MRT2.
Technical Details: Latency Breakdown
Control latency is ~200ms, composed of:
- Frame processing: 40ms (one frame)
- Depth decode: time to decode 12 RVQ tokens per frame
- Codec decode: time to convert tokens to audio waveform
The exact breakdown depends on hardware and model size, but the team reports a ~15x improvement over MRT's 3s latency.
Citation
If you use MRT2 in your work, cite:
@article{mrt2,
title = {Magenta RealTime 2: Open & Local Live Music Models},
author = {Magenta Team},
year = {2026},
note = {https://magenta.withgoogle.com/magenta-realtime-2}
}


