AI Agent Writes 100K Lines of Rust for Multi-Paxos Engine

A developer has demonstrated that AI coding agents can build production-grade distributed systems, producing a Rust-based multi-Paxos consensus engine with 100K lines of code written in roughly 4 weeks. The project modernizes Azure's Replicated State Library (RSL), adding pipelining, NVM support, and achieving a throughput improvement from ~23K ops/sec to ~300K ops/sec.

The project, detailed in a blog post by Cheng Huang, stress-tests multiple AI coding agents including GitHub Copilot, Claude Code, Codex, Augment Code, Kiro, and Trae. Huang settled on Claude Code and Codex CLI as primary drivers, with VS Code handling diffs.

Code Contracts by AI, For AI

Huang's breakthrough technique is AI-driven code contracts. These specify preconditions, postconditions, and invariants for critical functions, converted into runtime asserts during testing but disabled in production.

Three levels of application:

Ask AI to write contracts – Huang found GPT-5 High produces excellent contracts; Opus 4.1 is also good. He reviews and refines.
Generate tests from contracts – AI creates targeted test cases for each post-condition.
Property-based tests for contracts – AI translates contracts into property-based tests exploring randomized inputs. Any contract violation triggers a panic, exposing deep bugs early.

One AI-generated contract caught a subtle Paxos safety violation that could have caused replication inconsistency. The system now has 1,300+ tests covering unit, integration, and multi-replica scenarios with injected failures.

Lightweight Spec-Driven Development

Huang initially followed rigid Spec-Driven Development (SDD) with requirement, design, and task markdowns but found it too rigid. He switched to a lightweight approach using /specify from spec kit to generate a spec markdown with user stories and acceptance criteria. Then /clarify asks AI to self-critique and improve. Huang spends most time here.

For implementation, each user story is a single unit of work. AI generates a plan for that story. Huang reports this "sweet spot" works well with current AI capabilities.

Aggressive Performance Optimization

After correctness, Huang spent ~3 weeks purely on throughput tuning. The iterative loop:

Ask AI to instrument latency metrics across all code paths.
Run performance tests and output trace logs.
Let AI analyze latency breakdowns (writes Python scripts to calculate quantiles and identify bottlenecks).
AI proposes optimizations, implements one, re-measure, repeat.

This surfaced lock contention on async paths, redundant memory copies, and unnecessary task spawns. Key gains came from minimizing allocations, zero-copy techniques, avoiding locks, and selectively removing async overhead.

Wish List

Huang's wish list for future AI-assisted coding:

End-to-End User Story Execution: AI should take more autonomy to drive from user story to delivery.
Automated Contract Workflows: AI should generate tests from contracts, debug failures, and write property-based tests automatically, only alerting for genuine correctness issues.
Autonomous Performance Optimization: Repetitive tuning loops should be automated, with AI executing experiments independently.

Project Status

The project addresses 2 of 3 RSL limitations: pipelining and NVM support (using a verified persistence log from the PoWER Never Corrupts paper at OSDI 2025). RDMA support is TBD. The codebase is over 130K lines of Rust, with tests accounting for >65%.

Why It Matters

This project shows AI agents can now handle complex, correctness-critical systems programming. For developers building distributed systems, the techniques of AI-generated code contracts and automated performance profiling offer a template for leveraging AI to produce high-quality, optimized code much faster.

Editor's Take

I've been skeptical of AI writing production code for safety-critical systems, but the contract-driven approach here changes my mind. The key insight is that AI can self-verify through contracts and property-based tests, catching subtle bugs humans miss. I've used similar contracts in C# projects, but the AI integration here is a leap forward. My prediction: within a year, most new distributed systems will be built with this pattern.

Developer Insights

Use AI to generate code contracts (pre/post conditions) for critical functions, then have AI produce property-based tests from those contracts.
For complex features, break them into single user stories and use AI to generate implementation plans for each.
Automate performance profiling by having AI instrument latency metrics, analyze trace logs, and propose optimizations in an iterative loop.

AI Agent Writes 100K Lines of Rust for Multi-Paxos Engine