21,077 Tokens Wasted on Tool Definitions

MCP (Model Context Protocol) connects LLMs to tools like Linear, Notion, Slack, and Postgres. The promise: a universal plug-in. The reality: context bloat, reliability issues, and redundant complexity.

Quandri measured the tool definitions from their MCP stack. With 4 servers connected, tool schemas consumed 21,077 tokens — 10.5% of a Claude 200K context window, 16.5% of GPT-4o's 128K window. Linear alone contributed 12,807 tokens for 42 tools, most of which you never use.

The Context Window as a Desk

Think of the context window as your workspace. Every MCP server dumps its entire tool catalog on that desk before you start. The restaurant analogy from the original article fits: you sit down, 10 menus cover the table, and there's no room for food. Every time you order, the menus get pulled out again.

MCP Is Slower and Less Reliable

MCP adds a process layer between the LLM and the API. The original article benchmarked Jira MCP vs its REST API: MCP was 3x slower per call, 9.4x slower on first call (includes initialization). This isn't Jira-specific — every MCP server introduces the same overhead.

Common failures: init failure, repeated re-auth, mid-session crashes, opaque permissions. These aren't edge cases; they're architectural.

CLI Wins on Tokens and Debugging

Compare looking up a Linear issue:

CLI approach (~200 tokens):

curl -s -H "Authorization: Bearer $LINEAR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query":"{ issue(id: \"ISSUE-ID\") { title state { name } assignee { name } } }"}' \
  https://api.linear.app/graphql

Prompt: ~50 tokens, Response: ~150 tokens.

MCP approach (~12,957 tokens):

MCP consumes 65x more tokens. And you can reproduce the CLI command in a terminal immediately. MCP failures only reproduce inside the conversation.

The Skills Pattern: Load Only What You Need

Instead of spreading all menus upfront, Skills are like asking a librarian for the one book you need. A Linear skill looks like:

# Linear Issue Lookup Skill
- Linear API: https://api.linear.app/graphql
- Auth: Bearer Token ($LINEAR_TOKEN env var)
- Get issue: curl -s -H "Authorization: Bearer $LINEAR_TOKEN" ...
- Search issues: adjust query field for JQL-like filtering
- Results are JSON, parse with jq

The LLM only loads this into context when the skill is invoked. No wasted tokens.

When MCP Still Makes Sense

MCP isn't dead for everyone. It's valid when:

Database Recommendation

ScenarioRecommendationWhy
Local dev / personal DBSkills + CLILight, fast, easy recovery
Production DB / shared teamMCPSafety guardrails (query validation, access control)

The Marketing Problem

Every SaaS now slaps "MCP supported" on their landing page — same pattern as "AI-powered" and "blockchain-based" from years past. Stability and context cost are secondary to checking a box.

How Quandri Uses All Three

What You Should Do Now

Audit your MCP servers. Measure the token cost of their tool definitions. For every tool that has a CLI, replace the MCP server with a skill that wraps the CLI command. Keep MCP only where you need safety guardrails or where no CLI exists.

The goal isn't to eliminate MCP — it's to stop wasting 21K tokens on menus you never read.