The Problem: N64 Additive Blending Breaks

The PlayStation had a simple src + dst blend mode that clamped values to the maximum. The N64's Reality Display Processor (RDP) has a flexible color combiner but doesn't clamp the result. Colors wrap around when they exceed the range, producing ugly artifacts.

|   R |   G |   B |
| src (sprite)      | 171 |  42 | 226 |
| + dst (framebuffer) |  63 | 141 | 170 |
| = result            | 234 | 183 | 140 |  // B wraps from 255+141=140

Most N64 games used a 16-bit framebuffer (RGBA 5551) for performance, but the RDP can draw into 32-bit buffers. The author exploits this.

The Solution: Draw at 1/8 Intensity, Then Expand

The idea: render all additive-blended sprites into a 32-bit RGBA 8888 buffer but with colors pre-scaled to 1/8 of their original brightness. This gives headroom for multiple overlapping glows without wrapping.

Instead of preprocessing assets, the color combiner can scale on the fly using the fog alpha value:

rdpq_set_fog_color(RGBA32(0, 0, 0, 256/8));
rdpq_mode_blender(RDPQ_BLENDER(( IN_RGB, FOG_ALPHA, MEMORY_RGB, ONE )));

This multiplies every incoming color by 1/8 (FOG_ALPHA = 32), effectively dividing by 8. All additive blending happens in this scaled-down space, so the 8-bit components never exceed 255.

The Conversion: RSP Microcode to the Rescue

After rendering, the 32-bit buffer must be converted to 16-bit (RGBA 5551) for display, with clamping. Doing this on the CPU takes ~70ms per 320×240 frame — too slow.

Enter the Reality Signal Processor (RSP). Its 128-bit vector instructions process 8 pixels at a time. With optimized microcode (written in RSPL, a C-like language by HailToDodongo), the conversion runs in 3.1ms.

void cpu_rgba_8888_to_5551(uint32_t *rgba32_in, uint16_t *rgba16_out) {
    for (int i = 0; i < 320 * 240; i++) {
        color_t c = color_from_packed32(rgba32_in[i]);
        if (c.r > 31) { c.r = 31; }
        if (c.g > 31) { c.g = 31; }
        if (c.b > 31) { c.b = 31; }
        rgba16_out[i] = (c.r << 11) | (c.g << 6) | (c.b << 1) | 0x1;
    }
}

The Full Pipeline

  1. Init a 16-bit framebuffer for final display.
  2. Allocate a secondary 32-bit render buffer.
  3. Set fog alpha to 32 (256/8) and configure blender to multiply by fog alpha.
  4. Render all additive-blended sprites into the 32-bit buffer.
  5. Kick off RSP conversion from 32-bit to 16-bit.
  6. Present the 16-bit framebuffer.
display_init(RESOLUTION_320x240, DEPTH_16_BPP, 3, GAMMA_NONE, FILTERS_DISABLED);
surface_t render32 = surface_alloc(FMT_RGBA32, 320, 240);
rdpq_set_color_image(render32);
rdpq_set_fog_color(RGBA32(0, 0, 0, 256/8));
rdpq_mode_blender(RDPQ_BLENDER((IN_RGB, FOG_ALPHA, MEMORY_RGB, ONE)));
render_scene();
rsp_rgba_8888_to_5551(render32->buffer, screen->buffer);
display_show(screen);

Performance Trade-offs

Drawing into a 32-bit buffer doubles memory bandwidth, making fill-rate worse. The author notes this technique is "good enough for some applications" and suggests optimizations like only additive-blended sprites go to the 32-bit buffer, possibly at lower resolution.

Demo and Code

A complete demo project is available on GitHub: github.com/phoboslab/n64_addblend. It includes the RSP microcode and an example scene with additive explosions.

Why This Matters for N64 Homebrew

This workaround finally brings proper additive blending to the N64 without custom RSP code for every effect. Modern tooling (Libdragon, RSPL) makes it accessible. If you're making an N64 game and want glow effects, this is the way.