r/LocalLLaMA · June 14, 2026 · 1 min read

Qwen 3.6 35B-A3B @ Q4 or Gemma 4 12B @ Q8?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Wondering how much model quantization matters here. Daily driver on my 32gb unified memory setup is the qwen model outputting ~15 tokens a second.

Heard good things about the 12B Gemma 4 model so interested in trying it against my codebase. Given its size I can very comfortably fit the Q8 in. Hell, I could probably run it at BF16 lol

submitted by /u/mailto_devnull
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA