The Anatomy of an LLM API Call

If you've only ever used an SDK like openai or anthropic, you've probably never seen the raw request. Six lines of code, an API key, and it just works. But understanding what actually happens is crucial for building reliable, cost-effective applications.

This post is the first in the Building TinyAgent series, where we'll build a small agent from scratch in Node.js with no frameworks—just raw API calls.

The Request

Here's what a raw LLM API call looks like:

const response = await fetch('https://api.anthropic.com/v1/messages', {
  method: 'POST',
  headers: {
    'x-api-key': process.env.ANTHROPIC_API_KEY,
    'content-type': 'application/json',
    'anthropic-version': '2023-06-01'
  },
  body: JSON.stringify({
    model: 'claude-3-sonnet-20240229',
    max_tokens: 1024,
    messages: [
      { role: 'user', content: 'Hello, Claude!' }
    ]
  })
});

Three key takeaways:

  • Statelessness: The API remembers nothing between calls. To build a chatbot that "remembers" earlier messages, you must hold the entire messages array and resend it every time.
  • max_tokens is a hard stop: If the model hits this limit, the response stops mid-sentence. It's not a target.
  • Universal pattern: Different providers use different URLs and auth headers (e.g., OpenAI uses Authorization: Bearer), but the shape is identical: POST, JSON, {model, messages, max_tokens}. Learn one, switch providers with a find-and-replace.

The Response

The API returns a JSON blob with ~10 fields. Only four matter:

{
  "content": [{"text": "Hello! How can I help you today?", "type": "text"}],
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 12,
    "output_tokens": 10
  }
}

stop_reason tells you why the model stopped. In production systems, you branch on it:

  • end_turn → finished naturally, you're done
  • max_tokens → hit your ceiling, response is truncated
  • tool_use → model wants to call a tool (next post!)
  • stop_sequence → matched one of your stop strings

If you only check the text and ignore stop_reason, you will ship a bug. The response looks fine until it doesn't.

usage shows input and output token counts. Log this from day one—not after you get a surprise bill.

Tokens: Words ≠ Tokens

Tokens are how the model measures text. Surprising facts:

  • "Unbelievable" is one word but four tokens. The tokenizer splits on common substrings, not spaces.
  • Code costs more: def add(a, b): is 8 tokens. Every bracket and comma is its own token.
  • JSON is expensive: {"a":1} is 7 tokens. Bloated tool schemas quietly eat your budget on every request.
  • Non-English costs more: Japanese, Hindi, Arabic tend to run 2–4× the token count of the same content in English. This changes your cost math for global apps.

Rule of thumb: For English prose, ~1 token ≈ 4 characters ≈ 0.75 words. For anything else, run it through the tokenizer yourself.

The Bill: Input vs Output Pricing

Two meters run on every call, priced differently:

cost = (input_tokens / 1,000,000) × input_price
     + (output_tokens / 1,000,000) × output_price

Output tokens cost 3–5× more than input tokens. That's the one number to internalize.

Three consequences:

  • Long prompts are cheap, long responses are expensive. Stuffing 50 KB of context into a system prompt is fine. Asking for 50 KB of output costs ~5× more.
  • "Thinking" tokens count as output. Reasoning models bill internal thought at the output rate, even though you don't see it.
  • Tool schemas eat input on every call. They're resent with every request, just like the system prompt.

At $0.006 per call, 100k calls/day is $600/month from one small feature. Add usage logging now, not when you get the alert.

The Whole Thing in 20 Lines

Here's a complete, dependency-free Node.js script:

const API_KEY = process.env.ANTHROPIC_API_KEY;
const response = await fetch('https://api.anthropic.com/v1/messages', {
  method: 'POST',
  headers: {
    'x-api-key': API_KEY,
    'content-type': 'application/json',
    'anthropic-version': '2023-06-01'
  },
  body: JSON.stringify({
    model: 'claude-3-sonnet-20240229',
    max_tokens: 1024,
    messages: [{ role: 'user', content: 'Hello!' }]
  })
});
const data = await response.json();
console.log(data.content[0].text);
console.log('Stop reason:', data.stop_reason);
console.log('Usage:', data.usage);

Three Things to Try Before the Next Post

  1. Run it and watch the numbers: Make ten calls, change the prompt length, see how usage moves. You'll build a real instinct for cost faster than reading any doc.
  2. Set max_tokens: 20 and ask for something long: Watch it cut off. Check stop_reason. This is a bug you'll hit in production—meet it on purpose now.
  3. Build a multi-turn chat by hand: Keep a messages array, push each user message and model reply onto it, and resend the whole thing every turn. You'll immediately understand why long conversations get expensive—you're paying for the full history on every call.

In the next post, we'll expand TinyAgent to handle tool calls. Happy coding!