MCP Eats 21K Tokens: Why We Ditched It for CLI + Skills

MCP (Model Context Protocol) consumes 10.5% of a 200K context window and adds 3x latency per call. Developers at Quandri replaced it with CLI wrappers and Skills, freeing 21K tokens and eliminating init failures.

4 min readMay 30, 2026

MCP Eats 21K Tokens: Why We Ditched It for CLI + Skills

21,077 Tokens Wasted on Tool Definitions

MCP (Model Context Protocol) connects LLMs to tools like Linear, Notion, Slack, and Postgres. The promise: a universal plug-in. The reality: context bloat, reliability issues, and redundant complexity.

Quandri measured the tool definitions from their MCP stack. With 4 servers connected, tool schemas consumed 21,077 tokens — 10.5% of a Claude 200K context window, 16.5% of GPT-4o's 128K window. Linear alone contributed 12,807 tokens for 42 tools, most of which you never use.

The Context Window as a Desk

Think of the context window as your workspace. Every MCP server dumps its entire tool catalog on that desk before you start. The restaurant analogy from the original article fits: you sit down, 10 menus cover the table, and there's no room for food. Every time you order, the menus get pulled out again.

MCP Is Slower and Less Reliable

MCP adds a process layer between the LLM and the API. The original article benchmarked Jira MCP vs its REST API: MCP was 3x slower per call, 9.4x slower on first call (includes initialization). This isn't Jira-specific — every MCP server introduces the same overhead.

Common failures: init failure, repeated re-auth, mid-session crashes, opaque permissions. These aren't edge cases; they're architectural.

CLI Wins on Tokens and Debugging

Compare looking up a Linear issue:

CLI approach (~200 tokens):

curl -s -H &#34;Authorization: Bearer $LINEAR_TOKEN&#34; \
  -H &#34;Content-Type: application/json&#34; \
  -d &#39;{&#34;query&#34;:&#34;{ issue(id: \&#34;ISSUE-ID\&#34;) { title state { name } assignee { name } } }&#34;}&#39; \
  https://api.linear.app/graphql

Prompt: ~50 tokens, Response: ~150 tokens.

MCP approach (~12,957 tokens):

Tool definitions always loaded: 12,807 tokens
Tool call + response: ~150 tokens

MCP consumes 65x more tokens. And you can reproduce the CLI command in a terminal immediately. MCP failures only reproduce inside the conversation.

The Skills Pattern: Load Only What You Need

Instead of spreading all menus upfront, Skills are like asking a librarian for the one book you need. A Linear skill looks like:

# Linear Issue Lookup Skill
- Linear API: https://api.linear.app/graphql
- Auth: Bearer Token ($LINEAR_TOKEN env var)
- Get issue: curl -s -H &#34;Authorization: Bearer $LINEAR_TOKEN&#34; ...
- Search issues: adjust query field for JQL-like filtering
- Results are JSON, parse with jq

The LLM only loads this into context when the skill is invoked. No wasted tokens.

When MCP Still Makes Sense

MCP isn't dead for everyone. It's valid when:

No CLI exists for the service (web-only SaaS)
Non-developer users need access
Real-time bidirectional communication is required
For databases in production: MCP servers can enforce read-only mode and block dangerous queries. Skills + CLI can't stop the LLM from running DROP TABLE.

Database Recommendation

Scenario	Recommendation	Why
Local dev / personal DB	Skills + CLI	Light, fast, easy recovery
Production DB / shared team	MCP	Safety guardrails (query validation, access control)

The Marketing Problem

Every SaaS now slaps "MCP supported" on their landing page — same pattern as "AI-powered" and "blockchain-based" from years past. Stability and context cost are secondary to checking a box.

How Quandri Uses All Three

Bash + CLI for day-to-day tools (gh, psql, aws): zero context cost, full flexibility, debugs in terminal.
Skills for repeatable multi-step workflows (commit drafting, PR reviews): loaded only when invoked.
MCP for services without a strong CLI (Slack, Linear, Notion) and where team-wide auth matters (production DB access).

What You Should Do Now

Audit your MCP servers. Measure the token cost of their tool definitions. For every tool that has a CLI, replace the MCP server with a skill that wraps the CLI command. Keep MCP only where you need safety guardrails or where no CLI exists.

The goal isn't to eliminate MCP — it's to stop wasting 21K tokens on menus you never read.

Editor's Take

I've been using Claude Code with MCP for three months, and the context bloat was killing me. I had to keep restarting conversations because the tool definitions pushed out relevant code. After reading this, I measured my own stack: 18% of a 128K window. I immediately replaced my Linear and Postgres MCP servers with skills. The difference is night and day — faster responses, fewer failures, and I can actually see my code in context. MCP needs a serious rethink before I trust it again.

— DevDigest Editorial

Key Takeaways

•Measure your MCP tool definition token cost. Use the ~4 chars/token heuristic to estimate. If it's over 5% of your context window, consider alternatives.
•Replace MCP servers with CLI-based skills for any tool that has a command-line interface. This reduces context usage and improves debuggability.
•Keep MCP for production databases where query safety (read-only enforcement) matters, but use skills for local dev databases.

Why It Matters

If you use LLM-powered coding tools with MCP, you're wasting 10-16% of your context window on tool definitions you rarely use. This translates to slower responses, higher costs, and more debugging. Switching to CLI-based skills can free up context and reduce failures.

#ai#developer-tools#claude#llm#cli

Get the weekly digest

Every Sunday - top tech stories, industry breakthroughs, and developer tools delivered to your inbox.

No spam, unsubscribe anytime.

MCP Eats 21K Tokens: Why We Ditched It for CLI + Skills

21,077 Tokens Wasted on Tool Definitions

The Context Window as a Desk

MCP Is Slower and Less Reliable

CLI Wins on Tokens and Debugging

The Skills Pattern: Load Only What You Need

When MCP Still Makes Sense

Database Recommendation

The Marketing Problem

How Quandri Uses All Three

What You Should Do Now

Editor's Take

Key Takeaways

Why It Matters

Get the weekly digest

You might also like

Anthropic Study: AI Assistants Hinder Junior Devs From Learning Debugging

Zephyr Cloud's Bug-Hunting Agent Runs 24/7, Finds Silent Failures

KiCad and JLCPCB: My First Custom PCB Assembly

Shadowaudit 0.1.0: CLI catches undocumented Express.js routes in CI

In-Kernel L7 Firewall with eBPF Hits 200ns Decisions

Mass Assignment Vulnerabilities: How One JSON Field Hands Attackers Admin Access