Maximizing performance of 2x3090 + NVLink
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Hey all, I have built myself a decent rig with the following specs:
- Ubuntu 24.04
- 2x3090 founder’s with NVLink
- Ryzen 7950x3d
- 64GB DDR5
I am currently routing my display through an eGPU to maximize available VRAM. My current go-to is Qwen 3.6 27B Q8_0 with MTP and ik_llama’s graph split + ngl 99. It works very well with pi and I get very good output, but I can only manage to get ~60 Tok/s at the absolute maximum in very short bursts, and it lives around 40-45TPS on average.
I imagine that my setup, minus maybe the nvlink, is pretty common to this sub, so I’m curious to hear how people are squeezing more performance out of their cards, or if the stats I’m seeing are par for the course.
[link] [comments]
More from r/LocalLLaMA
-
Palantir CEO rages against closed models
Jul 2
-
SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing.
Jul 2
-
[Benchmark] Kimi K2.7 Code Q3 on Mac Studio M3 Ultra + RTX PRO 6000 over llama.cpp RPC: prefill improves, no changes in token generation/decode
Jul 2
-
They fit! Mostly.... 2x 3090, Thermaltake Core p3
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.