r/LocalLLaMA · June 5, 2026 · 1 min read

What exactly is quantization aware training?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

First time hearing it.

I also heard about the gemma 4 qat quants and if any one of them is good for 4gb vram and 16gb ram. I can run gemma 4 26b moe iq2 nl at 8.5 to 9 tps(kv cache unquantized on gpu) with 9 layers offloaded to gpu

submitted by /u/JournalistLucky5124
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA