Hy3 LLM Tops OpenRouter Rankings—But DeepSeek V4 Flash Is Ch

Hy3 LLM Tops OpenRouter Rankings—But DeepSeek V4 Flash Is Cheaper

The mysterious Hy3 preview model from Tencent is leading OpenRouter's ranking by token usage, yet its quality doesn't match top models. Analysis reveals DeepSeek V4 Flash with DeepSeek's own provider offers a far lower effective price due to aggressive cache read costs, making it the real bargain for developers.

3 min readMay 29, 2026

Hy3 LLM Tops OpenRouter Rankings—But DeepSeek V4 Flash Is Cheaper

The Mysterious Rise of Hy3

OpenRouter's AI Model Rankings show two models beating Claude in token usage by over 50%: DeepSeek Flash V4 and Hy3 preview. DeepSeek Flash V4 is a known open-source model with strong performance at low cost. Hy3 preview, however, is a mystery. Released by Chinese megacorp Tencent, its Hugging Face page is sparse and shows unimpressive benchmark results compared to other Chinese open-source models. A Hacker News search returns only one unrelated submission. Reddit discussions focus on its open-weights release, with a thread noting the model was free on OpenRouter until May 6, after which it became paid.

Despite this, Hy3 preview's usage has remained steady after the free period ended. On May 25, 2026, it was still ranked #2 by token usage. The model costs $0.066/1M input tokens, cheaper than DeepSeek V4 Flash's $0.10/1M. But quality is not on par with Claude Opus 4.7 or GPT 5.5. So why the popularity?

Data-Driven Investigation

OpenRouter's detailed data reveals that Hy3 preview's usage is organic—not a one-off spike from a single app switching defaults. The top 5 apps account for less than 1% of its activity. The model is served by only one provider: Singapore-based SiliconFlow. Before Hy3, SiliconFlow had little usage; after, it spiked. The transition from free to paid did not cause a drop, suggesting users find value.

The Real Cost of LLMs: Cache Economics

LLM API calls are stateless. Each turn reprocesses all tokens in the conversation, making input tokens dominate costs. For agentic workflows, input tokens can be 98% of total tokens. Providers implement prompt caching to reuse previously processed tokens, passing savings to customers. Most providers charge 10% of input cost for cache reads (OpenAI, Anthropic, Google). For DeepSeek V4 Flash, most providers charge 20%–50%—except DeepSeek itself.

DeepSeek's own provider charges a mere 2% cache read cost for DeepSeek V4 Flash, thanks to a new KV caching approach introduced with V4. The Pro variant has a cache read cost of 0.83%. This drastically reduces the effective price. OpenRouter now shows effective pricing tables on model pages. For DeepSeek V4 Flash via DeepSeek, the effective price is $0.018/1M input tokens. Hy3 preview via SiliconFlow has a 44% cache read cost, yielding an effective price of $0.034/1M. DeepSeek V4 Flash is nearly half the price.

Why Not Switch?

Given the cost advantage, why is Hy3 more popular? One reason: some developers may not want to route data through DeepSeek, a Chinese company with a data policy that includes training on prompts. Another: OpenRouter's automatic routing may not always select DeepSeek as provider; some clients may not support explicit provider selection. Subscription services like Claude Code or Codex are still better for heavy users, but for pay-as-you-go, DeepSeek V4 Flash via DeepSeek is the cheapest option.

Conclusion

The mystery of Hy3's popularity likely boils down to a single large app using it as a data-processing backbone, not an agentic coding app. But with the effective price data now public, expect a shift. Check OpenRouter's effective pricing table for your model. If you can use DeepSeek directly, you'll save over 50% compared to Hy3.

# Example: query DeepSeek V4 Flash via OpenRouter with explicit provider
curl https://openrouter.ai/api/v1/chat/completions \
  -H &#34;Authorization: Bearer $OPENROUTER_API_KEY&#34; \
  -H &#34;Content-Type: application/json&#34; \
  -d &#39;{
    &#34;model&#34;: &#34;deepseek/deepseek-v4-flash&#34;,
    &#34;provider&#34;: {&#34;order&#34;: [&#34;DeepSeek&#34;]},
    &#34;messages&#34;: [{&#34;role&#34;: &#34;user&#34;, &#34;content&#34;: &#34;Hello&#34;}]
  }&#39;

Editor's Take

I've been using OpenRouter for months and never bothered checking effective prices. After reading this, I ran the numbers for my own agentic workflows—DeepSeek V4 Flash via DeepSeek is indeed 2x cheaper than Hy3. I'm switching my routing config today. The cache economics are a game-changer, but only if you pick the right provider.

— DevDigest Editorial

Key Takeaways

•Check OpenRouter's effective pricing table before choosing a model—stated prices are misleading due to cache read cost differences.
•For agentic workflows with high input token ratios, prioritize models with low cache read costs (e.g., DeepSeek V4 Flash via DeepSeek at 2%).
•Consider geopolitical and data policy implications when choosing a provider; DeepSeek's data policy allows training on prompts.

Why It Matters

LLM API costs are dominated by input tokens, which are aggressively cached. Understanding effective pricing versus stated prices can save you significant money. DeepSeek V4 Flash with DeepSeek's own provider offers the lowest effective cost, but geopolitical and data policy concerns may drive some to use alternatives like Hy3.

#developer-tools#deepseek#api#llm#openrouter

Get the weekly digest

Every Sunday - top tech stories, industry breakthroughs, and developer tools delivered to your inbox.

No spam, unsubscribe anytime.

Hy3 LLM Tops OpenRouter Rankings—But DeepSeek V4 Flash Is Cheaper

The Mysterious Rise of Hy3

Data-Driven Investigation

The Real Cost of LLMs: Cache Economics

Why Not Switch?

Conclusion

Editor's Take

Key Takeaways

Why It Matters

Get the weekly digest

You might also like

100 Lines of Lisp: An AI Agent That Writes Its Own Tools

xAI Grok CLI 0.2.93 Uploads Secrets and Whole Repo Unredacted

OpenAI Launches ChatGPT Work Agent Powered by GPT-5.6

AI Agent Safety: Semantic Layer Beats Pattern Matching 12/12 on Rule Override Detection

Mass Assignment Vulnerabilities: How One JSON Field Hands Attackers Admin Access

Zephyr Cloud's Bug-Hunting Agent Runs 24/7, Finds Silent Failures