DeepSeek V4 Pro at 75% Off – Pricing, Specs, and What You Need to Know

DeepSeek just updated their pricing page, and the headline is hard to miss: the V4 Pro model is 75% off until May 31, 2026. If you've been on the fence about trying their API, this is a strong signal to jump in.

The Models

Two models are available: deepseek-v4-flash and deepseek-v4-pro. Both support 1 million token context windows with a maximum output of 384K tokens. That's massive – enough to process entire codebases or long documents in a single pass.

Both models support JSON output, tool calls, and chat prefix completion (beta). The Flash model also supports FIM (Fill-in-the-Middle) completion in non-thinking mode – useful for code completion tasks.

Pricing Breakdown

DeepSeek V4 Flash

  • Input (cache hit): $0.0028 per million tokens
  • Input (cache miss): $0.14 per million tokens
  • Output: $0.28 per million tokens

DeepSeek V4 Pro (75% off until May 31, 2026)

  • Input (cache hit): $0.003625 per million tokens (was $0.0145)
  • Input (cache miss): $0.435 per million tokens (was $1.74)
  • Output: $0.87 per million tokens (was $3.48)

The cache hit price for all models has been reduced to 1/10 of the launch price as of April 26, 2026. This means if your prompts hit the cache (common for repetitive tasks), you pay a fraction of the already low price.

Thinking Mode

Both models support thinking mode (default for Pro) and non-thinking mode. The Flash model defaults to thinking mode as well. You can switch modes via the API – useful for tasks that require reasoning versus those that need fast responses.

API Compatibility

The DeepSeek API is compatible with OpenAI format at https://api.deepseek.com and Anthropic format at https://api.deepseek.com/anthropic. This means you can drop it into existing projects with minimal code changes.

Deprecation Warning

Note that the model names deepseek-chat and deepseek-reasoner will be deprecated. They currently map to non-thinking and thinking modes of V4 Flash, respectively. Update your code to use deepseek-v4-flash explicitly.

What This Means for Developers

If you're building AI-powered features that require large context windows or tool calling, the V4 Pro at 75% off is a bargain. The cache hit pricing is absurdly low – $0.0036 per million tokens for Pro. That's competitive with even the cheapest alternatives.

Next Steps

  1. Check your current DeepSeek usage and see if you can benefit from the Pro model's thinking capabilities.
  2. Migrate from deprecated model names (deepseek-chat, deepseek-reasoner) to deepseek-v4-flash or deepseek-v4-pro.
  3. Optimize your prompts to maximize cache hits – structure inputs to reuse common prefixes.
  4. Act before May 31, 2026, to lock in the 75% discount on Pro.

The pricing page is live at api-docs.deepseek.com/quick_start/pricing. Go update your configs.