r/LocalLLaMA · · 1 min read

Uh.. Honey, how do you feel about takeout?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Uh.. Honey, how do you feel about takeout?

- 2x RTX Pro 6000 Max-Q (96GB)
- 8x RTX 3090 (24GB)
- 2x RTX 5090 (32GB)

- 3 PSUs
- 128GB DDR5 SDIMM RAM (4-channel)
- Threadripper 9960x
- 1x Ryobi Portable Fan
- 1x large Uber Eats bill

448GB VRAM
Running MiniMax M3 in AWQ-INT4 on VLLM via PP over TP groups of 2.

~30 tp/s per single stream
~960 tp/s batch

Can get 1m context for one user, but ideally want 4x concurrency. TBD where context will land… or my marriage…

submitted by /u/MotorcyclesAndBizniz
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA