GHC ApplicativeDo Optimization: O(n³) to O(n²) via RNA Foldi

GHC ApplicativeDo Optimization: O(n³) to O(n²) via RNA Folding

GHC's optimal ApplicativeDo flag is off by default due to O(n³) complexity. Ian Duncan reduced it to near O(n²) by applying the same dynamic programming used in RNA folding prediction, cutting compile times for 1000-statement blocks from 55 seconds to under 2.

3 min readJun 2, 2026

GHC ApplicativeDo Optimization: O(n³) to O(n²) via RNA Folding

The Problem: Optimal ApplicativeDo is Too Slow

GHC has a flag -foptimal-applicative-do that produces the best possible scheduling of independent statements in do notation. It's off by default because the algorithm is O(n³). A 1000-statement block takes 55 seconds to compile with the optimal flag, vs under 2 seconds with the default greedy heuristic.

Ian Duncan set out to fix this. The core problem: given a sequence of statements with dependencies, find the tree of Seq (sequential) and Par (parallel) combinators that minimizes the number of rounds (sequential depth). This is a dynamic programming problem over spans [i,j], with O(n²) spans and O(n) split points each, yielding O(n³).

The Insight: Borrow from RNA Folding

The recurrence for optimal cost is identical to Nussinov's algorithm for RNA secondary structure prediction. In RNA, bases pair to minimize free energy; in ApplicativeDo, statements group to minimize rounds. Both use the same triangular table and combine-over-split-point logic.

Duncan realized that for "tangled" spans (where no parallel grouping is possible), the optimal split must be at one of the two extremes: peel off the first statement or the last. This is because the cost function is monotonic — adding statements only increases cost. So any interior split would sum two larger sub-costs, which can't be cheaper than peeling a single statement. The only exception is when both extremes tie, requiring further search.

The Optimization: Bounding with Longest Chain

When a tie occurs, the optimal cost is either c or c+1. If the longest chain through the span equals c+1, the lower bound meets the upper bound, and the search terminates immediately. For a pure chain of dependencies, every span ties, but the longest chain equals its length, which is always c+1. So the algorithm performs a single check per span instead of O(n) splits.

Combined with the paper's existing optimizations (extreme-cut shortcut for non-tied spans), the total cut evaluations drop dramatically:

100-chain: 166,650 naive → 4,950 optimized
100 realistic: 4,544 → 1,426
200 realistic: 47,471 → 3,583

The trees produced are identical to the naive optimal — same schedule, less work.

Residual Worst Case

Not all inputs benefit. A "hub" pattern (every statement feeds one final sink) still requires full O(n³) because every span ties with longest chain = 2, never closing the bound. Such shapes are rare in real code. For those, Duncan suggests a follow-up algorithm (not detailed here).

Practical Impact

This optimization makes -foptimal-applicative-do viable for everyday use. Developers using Haxl or other batching monads can now write plain do notation without worrying about manual <$> and <*> refactoring, and get optimal parallelism without 55-second compile times.

The patch is not yet merged into GHC; Duncan is working on it. But the technique — borrowing DP from computational biology — is a neat example of cross-domain algorithm transfer.

Code Example

Consider a block with two independent fetches:

numCommonFriends x y = do
  fx &lt;- friendsOf x
  fy &lt;- friendsOf y
  return (length (intersect fx fy))

With -foptimal-applicative-do, GHC desugars this to:

(\fx fy -&gt; length (intersect fx fy)) &lt;$&gt; friendsOf x &lt;*&gt; friendsOf y

Instead of the sequential:

friendsOf x &gt;&gt;= \fx -&gt; friendsOf y &gt;&gt;= \fy -&gt; return (length (intersect fx fy))

This reduces two network round-trips to one.

Editor's Take

I've been burned by GHC's greedy ApplicativeDo in Haxl code before — it left performance on the table for complex dependency graphs. This optimization is exactly what I'd hoped someone would tackle. The RNA folding connection is a delightful surprise, but I'm more excited that the patch makes optimal scheduling a default candidate. If it lands in GHC 9.12, I'll flip the flag on in all my projects.

— DevDigest Editorial

Key Takeaways

•Enable -foptimal-applicative-do in your .cabal or package.yaml for Haxl projects to get automatic batching without manual refactoring.
•For large do blocks (100+ statements), expect compile times to drop from minutes to seconds with the optimized algorithm.
•If you encounter a 'hub' pattern (many independent dependencies feeding a single sink), consider restructuring to avoid the residual O(n³) case.

Why It Matters

If you use Haxl or any batching monad in Haskell, enabling optimal ApplicativeDo can cut latency significantly. This fix removes the compile-time penalty that kept the feature off by default, making it practical for production builds.

#optimization#compiler#Haskell#ghc#applicative-do

Get the weekly digest

Every Sunday - top tech stories, industry breakthroughs, and developer tools delivered to your inbox.

No spam, unsubscribe anytime.

GHC ApplicativeDo Optimization: O(n³) to O(n²) via RNA Folding

The Problem: Optimal ApplicativeDo is Too Slow

The Insight: Borrow from RNA Folding

The Optimization: Bounding with Longest Chain

Residual Worst Case

Practical Impact

Code Example

Editor's Take

Key Takeaways

Why It Matters

Get the weekly digest

You might also like

Claude Code Left 10 Orphaned Busy-Loops Saturating My CPU for 2 Days

NpgsqlRest: Delete Your Backend, Keep PostgreSQL

OpenStrike Ships: A Counter-Strike FPS Running at 60 FPS on a 2004 PSP

Static Search Trees: 40x Faster Than Binary Search on Large Datasets

Porting nanochat to TPU v6e: PyTorch Wins and JAX Breaks

NpgsqlRest: Delete Your Backend, Keep PostgreSQL