r/LocalLLaMA · June 6, 2026 · 1 min read

Has there been any recent new development on which quant is considered optimal?

#gpu

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Like Read original ↗

I recall in earlier days, q4 was said to be optimal.

That is to say, if you have a:

small q8 model
medium q4 model
large q2

Assuming they use the same amount of GPU VRAM, medium q4 would be the best-performing model.

I also know that Apple (crazy that I am citing Apple here, given how secretive they tend to be) was quite public about using q4 quant models for thier on device.

submitted by /u/takuonline
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA