[3090] Gemma4 QAT + MTP quick TPS numbers [TLDR 1.2-1.8x better]
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| These last few weeks have been godsend for 24GB (and below) gpu poor peeps.
We're at the tipping point where GPU poor (24gb and below) people are actually NOT poor any more. I was already happy with Gemma 4 31b running at 40tok/s but now its 70-80tok/s Its not a wonder 3090 prices are increasing. For ref: • Hardware [link] [comments] |
More from r/LocalLLaMA
-
6x P40 running Minimax M2.7_Q3_XL
Jul 2
-
Fine-tuned Gemma-4-31B specifically for Copywriting & Creative Writing Tasks (Scored +290 Elo over base using EqBench3)
Jul 2
-
Gemma 4 WebGPU Kernels 255 tok/s by x/@xenovacom
Jul 2
-
openlumara, my manually coded super-token-efficient harness, now works across any UI that can connect to an openAI endpoint! koboldlite, openwebui, you name it. basically, openAI bridge. yay!
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.