DeepSeek V4 Pro at 75% Off: Pricing Details and Model Specs

DeepSeek has slashed prices on its V4 Pro model by 75% until May 31, 2026. The V4 Flash and Pro offer 1M token context, thinking modes, and tool calls. Cache hit pricing is also drastically reduced.

3 min readMay 7, 2026

DeepSeek V4 Pro at 75% Off: Pricing Details and Model Specs

DeepSeek V4 Pro at 75% Off – Pricing, Specs, and What You Need to Know

DeepSeek just updated their pricing page, and the headline is hard to miss: the V4 Pro model is 75% off until May 31, 2026. If you've been on the fence about trying their API, this is a strong signal to jump in.

The Models

Two models are available: deepseek-v4-flash and deepseek-v4-pro. Both support 1 million token context windows with a maximum output of 384K tokens. That's massive – enough to process entire codebases or long documents in a single pass.

Both models support JSON output, tool calls, and chat prefix completion (beta). The Flash model also supports FIM (Fill-in-the-Middle) completion in non-thinking mode – useful for code completion tasks.

Pricing Breakdown

DeepSeek V4 Flash

Input (cache hit): $0.0028 per million tokens
Input (cache miss): $0.14 per million tokens
Output: $0.28 per million tokens

DeepSeek V4 Pro (75% off until May 31, 2026)

Input (cache hit): $0.003625 per million tokens (was $0.0145)
Input (cache miss): $0.435 per million tokens (was $1.74)
Output: $0.87 per million tokens (was $3.48)

The cache hit price for all models has been reduced to 1/10 of the launch price as of April 26, 2026. This means if your prompts hit the cache (common for repetitive tasks), you pay a fraction of the already low price.

Thinking Mode

Both models support thinking mode (default for Pro) and non-thinking mode. The Flash model defaults to thinking mode as well. You can switch modes via the API – useful for tasks that require reasoning versus those that need fast responses.

API Compatibility

The DeepSeek API is compatible with OpenAI format at https://api.deepseek.com and Anthropic format at https://api.deepseek.com/anthropic. This means you can drop it into existing projects with minimal code changes.

Deprecation Warning

Note that the model names deepseek-chat and deepseek-reasoner will be deprecated. They currently map to non-thinking and thinking modes of V4 Flash, respectively. Update your code to use deepseek-v4-flash explicitly.

What This Means for Developers

If you're building AI-powered features that require large context windows or tool calling, the V4 Pro at 75% off is a bargain. The cache hit pricing is absurdly low – $0.0036 per million tokens for Pro. That's competitive with even the cheapest alternatives.

Next Steps

Check your current DeepSeek usage and see if you can benefit from the Pro model's thinking capabilities.
Migrate from deprecated model names (deepseek-chat, deepseek-reasoner) to deepseek-v4-flash or deepseek-v4-pro.
Optimize your prompts to maximize cache hits – structure inputs to reuse common prefixes.
Act before May 31, 2026, to lock in the 75% discount on Pro.

The pricing page is live at api-docs.deepseek.com/quick_start/pricing. Go update your configs.

Key Takeaways

•Migrate from deprecated model names deepseek-chat and deepseek-reasoner to deepseek-v4-flash or deepseek-v4-pro.
•Optimize prompts to maximize cache hits and reduce costs further.
•Leverage the 1M token context window for processing large codebases or documents in one pass.

Why It Matters

DeepSeek's 75% discount on V4 Pro makes it one of the cheapest high-capability AI APIs available, especially for tasks that benefit from large context windows and thinking mode. Developers building AI agents, code assistants, or document processors should evaluate this pricing before the discount expires.

#ai#deepseek#api#pricing#llm

Get the weekly digest

Every Sunday - top tech stories, industry breakthroughs, and developer tools delivered to your inbox.

No spam, unsubscribe anytime.

Google bets on licensing, not consulting, to win enterprise AI distribution

Google is negotiating omnibus licensing deals with private equity giants Blackstone, KKR, and EQT to give their portfolio companies access to Gemini models. This contrasts with OpenAI and Anthropic's multi-billion dollar consulting joint ventures that embed engineers inside client organizations. Google's approach prioritizes distribution speed over implementation depth.

5 min read·about 2 hours ago