r/LocalLLaMA · · 2 min read

I mapped which local LLMs actually fit each RAM tier, 8 to 128GB (open dataset)

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I kept answering the same question for friends ("I've got a 16GB MacBook / a 3060, what can I actually run?") and got tired of guessing, so I started a spreadsheet. It grew into a real dataset, so I put it on GitHub under CC BY for anyone to use or fix.

Rule of thumb I landed on: at Q4_K_M a model needs roughly 0.6GB of memory per billion params, and you want to size to about 70% of your RAM/VRAM so the OS, context and KV cache still have room. From that, the comfortable ceiling per tier (62 local models in the set right now):

RAM usable budget max params that fit models that fit
8GB ~5.6GB ~8B 23
16GB ~11GB ~14B 36
24GB ~17GB ~27B 41
32GB ~22GB ~35B 50
48GB ~34GB ~47B 53
64GB ~45GB ~70B 56
128GB ~90GB ~122B 58

The full thing (specific models per tier, quant, load size, the ollama command for each, plus GPU / Mac / iPhone breakdowns) is here: https://github.com/Wecko-ai/modelfit-hardware-dataset . There's a JSON API too if you'd rather pull it programmatically.

Honest caveats:

  • the tok/s figures are bandwidth-derived estimates, not benchmarks I ran on every chip. Ballpark only.
  • coverage is strongest on Apple Silicon and consumer NVIDIA. AMD is newer and thinner.
  • "fits" means it loads and runs at a usable speed, not "fits at full context" (long context eats a lot more).

If something looks off (a model that should fit and doesn't, a quant I got wrong, a card I'm missing), tell me or open a PR. That's the whole point of it being open.

(full disclosure: I also built a site and CLI on top of this, modelfit.io, but the dataset itself is the useful part and it's free to use)

submitted by /u/WecK0
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA