r/LocalLLaMA · June 17, 2026 · 1 min read

Cheapest way to run GLM 5.x locally that's not a unified memory system?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

This is primarily an exercise to determine the possible options, obscure as they might be, to run at least a 4bit quant (let's say roughly IQ4_XS).

Got a CPU only setup? Please share your experience. Sapphire Rapids ES 56core + DDR5 might be an option
Multi GPU setups with partial or complete offloading? What's your performance like?
It's not limited to GLM 5.x, anything similarly sized is ok too for the scope of this discussion.

Personally, I'm running a 5900X + 128GB DDR4 + 7900XT 20GB. The largest model I can run is Minimax 2.7 from AesSedAI at Q4_K_S - https://huggingface.co/AesSedai/MiniMax-M2.7-GGUF

For smaller stuff, it's still Qwen 3.6 27B at IQ4_XS from Unsloth/Bartowski.

Discussion (0)

No comments yet. Sign in and be the first to say something.