GLM-5.2 Now Runnable Locally via Unsloth

Original: GLM-5.2 – How to Run Locally

Why This Matters

Enables local deployment of frontier-class models without cloud dependency; impacts AI accessibility and edge computing adoption.

Unsloth released documentation for running Z.ai's GLM-5.2 model locally. The 744-billion-parameter model with 1M context window supports dynamic quantization, with 2-bit versions requiring 245GB memory and achieving 82% accuracy versus full model.

Unsloth published a guide for running GLM-5.2, Z.ai's latest open-source large language model, on local hardware. GLM-5.2 contains 744 billion total parameters with 40 billion active parameters and supports a 1-million-token context window. According to the documentation, the model matches performance of Claude 4.8 Opus, GPT-5.5, and Gemini 3.1 Pro on Artificial Analysis and other benchmarks. Unsloth Dynamic GGUF quantization enables local deployment: the 1-bit quantization reaches 76.2% top-1 accuracy while reducing size by 86 percent; the 2-bit version achieves 82% accuracy with 84% size reduction. Hardware requirements vary by quantization level—1-bit requires 223GB memory, 2-bit requires 245GB, and full 8-bit requires 810GB. The 2-bit version uses 239GB disk space and fits on 256GB unified-memory Macs or single 24GB GPUs with 256GB system RAM. GLM-5.2 includes three thinking modes (non-thinking, high-thinking, and max-thinking) for reasoning tasks. Recommended settings include temperature 1.0 and top_p 0.95 for general use, with temperature 1.0 and top_p 1.0 for SWE-Bench Pro benchmarks. Users can disable reasoning via command-line parameters.

Source

unsloth.ai — Read original →