GLM-5.2 vs Opus 4.8: A Real-World Vibe Test
Z.ai released GLM-5.2, an open-weight model under MIT license, positioning it between Claude Opus 4.7 and 4.8. To cut through the hype, we ran both models head-to-head on a single task: build a 3D platformer from scratch in raw WebGL, no engines or libraries. The results reveal a clear tradeoff between cost and polish.
The Setup
Both models received the same one-shot prompt: create a 3D platformer with a GLB model parser, matrix math, GLSL shaders, skinned animation, collision detection, and a follow camera. We provided identical CC0 assets from Kenney's Platformer Kit. Opus 4.8 ran in Claude Code with extended thinking; GLM-5.2 ran in Pi over OpenRouter with thinking set to High (not Max).
The Numbers
| Metric | GLM-5.2 | Opus 4.8 |
|---|---|---|
| Wall-clock build time | 1h 10m 40s | 33m 30s |
| Output tokens | 131,000 | 216,809 |
| Peak context window | 16% of 1M | 19% of 1M |
| Tool calls | 128 | 153 |
| Cost | $5.39 | ~$21.92 |
GLM-5.2 cost roughly one-fifth of Opus, but took twice as long.
Game Quality
Both games run in the browser with WASD controls, mouse camera orbit, and a goal to collect coins and reach a flag. Here's how they differed.
GLM-5.2's game was rough:
- Character faces backward while moving forward.
- Textures missing — character renders flat gray because the renderer never loaded the shared color palette file.
- Spike hazard doesn't kill the player.
- Reaching the flag triggers no win condition.
- Debug overlay remained on screen.
Opus's game was cleaner:
- Camera, controls, and collision worked correctly.
- Spike kills the player (though placed off the main path).
- Flag triggers a win condition.
- Animations and textures applied properly.
- Only two minor bugs: coyote-time grace period slightly too generous (character stands on air), and win triggers from too far away.
The Multimodal Gap
Opus can read images; GLM-5.2 is text-only. Both models were instructed to verify their work. Opus took a screenshot, inspected it, and noticed the debug overlay — then removed it. GLM-5.2 tried to verify by writing a script to sample pixel colors from the saved frame. It confirmed "grass green, dirt brown, coin gold, flag red" and stopped, never seeing that the character was gray or the overlay was on. On visual tasks, multimodality is a decisive advantage.
Benchmarks
Z.ai published benchmark scores comparing GLM-5.2 to Opus 4.8, GPT-5.5, and Gemini 3.1 Pro. Selected results:
| Benchmark | GLM-5.2 | Opus 4.8 |
|---|---|---|
| HLE (reasoning) | 40.5 | 49.8* |
| AIME 2026 | 99.2 | 95.7 |
| GPQA-Diamond | 91.2 | 93.6 |
| IMO AnswerBench | 91.0 | 83.5 |
GLM-5.2 leads on math (AIME, IMO) but trails on general reasoning and science.
Pricing and Access
GLM-5.2 is MIT-licensed, weights on Hugging Face and ModelScope. API pricing per 1M tokens:
| Input | Cache read | Output | |
|---|---|---|---|
| Claude Opus 4.8 | $5 | $0.50 | $25 |
| GLM-5.2 | $1.4 | $0.26 | $4.4 |
GLM-5.2 output tokens cost less than a fifth of Opus. You can serve it locally with vLLM, SGLang, or Transformers.
What This Means
For cost-sensitive or self-hosted workflows, GLM-5.2 is a strong option — especially for math-heavy or text-only tasks. But for visual reasoning or production-grade code generation, Opus still delivers better results. The open-weight advantage means GLM-5.2 won't disappear if its vendor pivots; you can always run it yourself.
Try the games yourself:
- GLM-5.2: 3dgame-glm.d.ritzademo.com
- Opus: 3dgame-opus.d.ritzademo.com
- Source: github.com/jamesdanielwhitford/glm-5.2-vs-opus-platformers
Bottom line: If you need a cheap, open alternative for agentic coding, GLM-5.2 is worth testing. If you need polished output and visual verification, stick with Opus.





