r/LocalLLaMA · July 3, 2026 · 1 min read

Uh.. Honey, how do you feel about takeout?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

- 2x RTX Pro 6000 Max-Q (96GB)
- 8x RTX 3090 (24GB)
- 2x RTX 5090 (32GB)

- 3 PSUs
- 128GB DDR5 SDIMM RAM (4-channel)
- Threadripper 9960x
- 1x Ryobi Portable Fan
- 1x large Uber Eats bill

448GB VRAM
Running MiniMax M3 in AWQ-INT4 on VLLM via PP over TP groups of 2.

~30 tp/s per single stream
~960 tp/s batch

Can get 1m context for one user, but ideally want 4x concurrency. TBD where context will land… or my marriage…

Discussion (0)

No comments yet. Sign in and be the first to say something.