21,077 Tokens Wasted on Tool Definitions
MCP (Model Context Protocol) connects LLMs to tools like Linear, Notion, Slack, and Postgres. The promise: a universal plug-in. The reality: context bloat, reliability issues, and redundant complexity.
Quandri measured the tool definitions from their MCP stack. With 4 servers connected, tool schemas consumed 21,077 tokens — 10.5% of a Claude 200K context window, 16.5% of GPT-4o's 128K window. Linear alone contributed 12,807 tokens for 42 tools, most of which you never use.
The Context Window as a Desk
Think of the context window as your workspace. Every MCP server dumps its entire tool catalog on that desk before you start. The restaurant analogy from the original article fits: you sit down, 10 menus cover the table, and there's no room for food. Every time you order, the menus get pulled out again.
MCP Is Slower and Less Reliable
MCP adds a process layer between the LLM and the API. The original article benchmarked Jira MCP vs its REST API: MCP was 3x slower per call, 9.4x slower on first call (includes initialization). This isn't Jira-specific — every MCP server introduces the same overhead.
Common failures: init failure, repeated re-auth, mid-session crashes, opaque permissions. These aren't edge cases; they're architectural.
CLI Wins on Tokens and Debugging
Compare looking up a Linear issue:
CLI approach (~200 tokens):
curl -s -H "Authorization: Bearer $LINEAR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"query":"{ issue(id: \"ISSUE-ID\") { title state { name } assignee { name } } }"}' \
https://api.linear.app/graphql
Prompt: ~50 tokens, Response: ~150 tokens.
MCP approach (~12,957 tokens):
- Tool definitions always loaded: 12,807 tokens
- Tool call + response: ~150 tokens
MCP consumes 65x more tokens. And you can reproduce the CLI command in a terminal immediately. MCP failures only reproduce inside the conversation.
The Skills Pattern: Load Only What You Need
Instead of spreading all menus upfront, Skills are like asking a librarian for the one book you need. A Linear skill looks like:
# Linear Issue Lookup Skill
- Linear API: https://api.linear.app/graphql
- Auth: Bearer Token ($LINEAR_TOKEN env var)
- Get issue: curl -s -H "Authorization: Bearer $LINEAR_TOKEN" ...
- Search issues: adjust query field for JQL-like filtering
- Results are JSON, parse with jq
The LLM only loads this into context when the skill is invoked. No wasted tokens.
When MCP Still Makes Sense
MCP isn't dead for everyone. It's valid when:
- No CLI exists for the service (web-only SaaS)
- Non-developer users need access
- Real-time bidirectional communication is required
- For databases in production: MCP servers can enforce read-only mode and block dangerous queries. Skills + CLI can't stop the LLM from running
DROP TABLE.
Database Recommendation
| Scenario | Recommendation | Why |
|---|---|---|
| Local dev / personal DB | Skills + CLI | Light, fast, easy recovery |
| Production DB / shared team | MCP | Safety guardrails (query validation, access control) |
The Marketing Problem
Every SaaS now slaps "MCP supported" on their landing page — same pattern as "AI-powered" and "blockchain-based" from years past. Stability and context cost are secondary to checking a box.
How Quandri Uses All Three
- Bash + CLI for day-to-day tools (gh, psql, aws): zero context cost, full flexibility, debugs in terminal.
- Skills for repeatable multi-step workflows (commit drafting, PR reviews): loaded only when invoked.
- MCP for services without a strong CLI (Slack, Linear, Notion) and where team-wide auth matters (production DB access).
What You Should Do Now
Audit your MCP servers. Measure the token cost of their tool definitions. For every tool that has a CLI, replace the MCP server with a skill that wraps the CLI command. Keep MCP only where you need safety guardrails or where no CLI exists.
The goal isn't to eliminate MCP — it's to stop wasting 21K tokens on menus you never read.




