What Is Sakana Fugu?
Sakana Fugu is a multi-agent system that dynamically orchestrates a pool of expert models to solve complex, multi-step tasks. It exposes a single OpenAI-compatible API, so you don't need to manage multiple providers or endpoints. Fugu learns how to coordinate agents using reinforcement learning, rather than relying on human-designed workflows.
How It Works
Fugu is built on two ICLR 2026 papers: TRINITY and Conductor. TRINITY uses a lightweight evolved coordinator to assign roles (Thinker, Worker, Verifier) to different LLMs across multiple turns. Conductor uses reinforcement learning to discover natural-language coordination strategies, designing agent communication patterns and focused prompts.
This learned orchestration means Fugu can dynamically assemble agents from a pool and coordinate them through non-obvious but efficient collaboration patterns. You don't need domain knowledge to prescribe team organization or roles.
Two Models: Fugu and Fugu Ultra
Fugu comes in two variants:
- Fugu: Balanced performance and latency. Ideal for everyday coding, code review (via Codex), and responsive chatbots.
- Fugu Ultra: Optimized for performance. Uses a deeper pool of expert agents for hard, high-stakes problems like Kaggle competitions, paper reproduction, and cybersecurity analysis.
Both are accessible via the same API endpoint, and you can opt specific agents out to meet data, privacy, or compliance requirements.
Benchmark Performance
Fugu models surpass publicly accessible frontier models and compete with Fable 5 and Mythos Preview. Here are key benchmark scores:
| Benchmark | Fugu | Fugu Ultra | Opus 4.8 | Gemini 3.1 Pro | GPT 5.5 |
|---|---|---|---|---|---|
| SWE Bench Pro* | 59.0 | 73.7 | 69.2 | 54.2 | 58.6 |
| TerminalBench 2.1 | 80.2 | 82.1 | 74.6 | 70.3 | 78.2 |
| LiveCodeBench | 92.9 | 93.2 | 87.8 | 88.5 | 85.3 |
| Humanity's Last Exam | 47.2 | 50.0 | 49.8 | 44.4 | 41.4 |
| GPQA-D | 95.5 | 95.5 | 92.0 | 94.3 | 93.6 |
*Fugu Ultra leads in most benchmarks, especially SWE Bench Pro (73.7) and TerminalBench 2.1 (82.1).
Qualitative Results
In an experiment where an AI agent autonomously improved a small GPT's training recipe (AutoResearch), Fugu Ultra achieved the best mean bits-per-byte (BPB) of 0.9774 ± 0.0019 over 123 experiments on a single H100 GPU, outperforming Model C (0.9781), Model B (0.9793), and Model A (0.9822). Its best single run reached 0.9748.
API Usage
Fugu provides an OpenAI-compatible API. To switch between Fugu and Fugu Ultra, just change the model name in your request:
import openai
client = openai.OpenAI(api_key="your-key", base_url="https://api.sakana.ai/v1")
response = client.chat.completions.create(
model="fugu-ultra", # or "fugu"
messages=[{"role": "user", "content": "Write a Python function to merge two sorted lists."}]
)
print(response.choices[0].message.content)
Why It Matters
Fugu abstracts away the complexity of multi-agent orchestration. Instead of manually routing tasks to different LLMs, you get a single endpoint that learns the best collaboration pattern for each request. This reduces API complexity and improves cost-performance. It also offers flexibility in agent selection for compliance.
Availability
Currently not available in the EU/EEA due to GDPR compliance work. Available elsewhere via API.
Next Steps
Try the Fugu API today at sakana.ai/fugu/. Start with the Fugu model for everyday tasks, and switch to Fugu Ultra for high-stakes problems. If you're building multi-agent systems, this could replace your hand-rolled orchestration.



