r/LocalLLaMA · July 1, 2026 · 2 min read

I mapped which local LLMs actually fit each RAM tier, 8 to 128GB (open dataset)

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I kept answering the same question for friends ("I've got a 16GB MacBook / a 3060, what can I actually run?") and got tired of guessing, so I started a spreadsheet. It grew into a real dataset, so I put it on GitHub under CC BY for anyone to use or fix.

Rule of thumb I landed on: at Q4_K_M a model needs roughly 0.6GB of memory per billion params, and you want to size to about 70% of your RAM/VRAM so the OS, context and KV cache still have room. From that, the comfortable ceiling per tier (62 local models in the set right now):

RAM	usable budget	max params that fit	models that fit
8GB	~5.6GB	~8B	23
16GB	~11GB	~14B	36
24GB	~17GB	~27B	41
32GB	~22GB	~35B	50
48GB	~34GB	~47B	53
64GB	~45GB	~70B	56
128GB	~90GB	~122B	58

The full thing (specific models per tier, quant, load size, the ollama command for each, plus GPU / Mac / iPhone breakdowns) is here: https://github.com/Wecko-ai/modelfit-hardware-dataset . There's a JSON API too if you'd rather pull it programmatically.

Honest caveats:

the tok/s figures are bandwidth-derived estimates, not benchmarks I ran on every chip. Ballpark only.
coverage is strongest on Apple Silicon and consumer NVIDIA. AMD is newer and thinner.
"fits" means it loads and runs at a usable speed, not "fits at full context" (long context eats a lot more).

If something looks off (a model that should fit and doesn't, a quant I got wrong, a card I'm missing), tell me or open a PR. That's the whole point of it being open.

(full disclosure: I also built a site and CLI on top of this, modelfit.io, but the dataset itself is the useful part and it's free to use)

submitted by /u/WecK0
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA