Anyone seen benchmarks comparing Gemma 4 4-bit QAT vs. 8-bit standard quants?
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
I'm trying to find out if anyone has done any benchmarking comparing the Gemma 4 4-bit QAT models (via Unsloth) against standard 8-bit non-QAT quants.
I know QAT is supposed to retain a ton of accuracy compared to the baseline BF16, but I'm curious how a 4-bit QAT model actually fares against a traditional 8-bit PTQ. I've read some mixed feedback across different threads, but I haven't been able to find hard numbers or a direct head to head comparison between the two.
Has anyone run any evaluations on this yet?
[link] [comments]
More from r/LocalLLaMA
-
Local benchmarks with a RTX 3090 - Qwen3.6 27b vs Ornith
Jul 2
-
July 4th is coming up, is there any vision model that's good for picking up fire?
Jul 2
-
It's officially over. One of the fathers of AI at Nvidia doesn't believe in AGI and compares OpenAI and Anthropic's closed models to AOL and Prodigy's closed internets. Says the future is every business having a customized open source model.
Jul 2
-
6x P40 running Minimax M2.7_Q3_XL
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.