Z.ai's GLM-5.2 runs on Huawei, skips Nvidia, and still out-codes the West 🇨🇳

Z.ai released GLM-5.2 on June 16, a 744-billion-parameter mixture-of-experts model that the Beijing-based lab says rivals frontier systems from U.S. competitors while being trained entirely on Huawei Ascend chips with no Nvidia hardware in the pipeline. The company has been on the U.S. Entity List since January 2025, and the launch coincides with a separate U.S. ban on Anthropic Fable; over the past week, that combination has driven Z.ai's stock up 90% to a new all-time high. Former Stability AI founder Emad Mostaque estimated total training costs at around $25 million, with roughly 80% of that spent on post-training.

On FrontierSWE, which scores an AI agent's ability to complete open-ended technical projects by dominance rate, GLM-5.2 reached 74.4 against Claude Opus 4.8's 75.1 and edged out GPT-5.5 at 72.6. On SWE-bench Pro, which measures autonomous resolution of real-world GitHub issues by pass rate, GLM-5.2 scored 62.1 to GPT-5.5's 58.6 and cleared its predecessor GLM-5.1's 58.4. The release tops the Artificial Analysis Intelligence Index, an aggregate of nine benchmarks assessing general model quality, making it the highest-scoring open-source model on that index to date, and OpenRouter's benchmarks place it in the same category as the now-banned Claude Fable 5.

The operational shift for developers is a 1 million-token context window, up from GLM-5.1's 200,000-token limit, which enables whole-repo navigation, multi-file refactors, and long agentic pipelines in single calls. Z.ai ships the model under an MIT license, meaning no government directive can revoke downstream access. API pricing is set at $1.40 per million input tokens and $4.40 per million output tokens, compared with $5 input and $25 output for Claude Opus 4.8, and a Coding Plan starts at around $18 per month and integrates with Claude Code, Cline, and Kilo Code.

For local deployment, Unsloth AI released 2-bit GGUF quantizations that compress the model from 1.51TB down to 238GB while retaining roughly 82% accuracy, though the quant still requires 256GB of unified memory or a matching RAM and VRAM combination such as a maxed-out M4 Ultra Mac Studio. As Decrypt reported earlier this year, Z.ai had already been training image models on Huawei's Ascend Atlas servers without any American chips, and GLM-5.2 extends that same Nvidia-free infrastructure to a frontier-scale coding model.

Z.ai's GLM-5.2 runs on Huawei, skips Nvidia, and still out-codes the West 🇨🇳

Share Article

Quick Info