For dual GPUs, will there be any big impact to inference speeds when running in PCIe 5.0 x8/x4 vs x8/x8?
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
I bought the Biostar Z890 Valkyrie because it was on sale and had three PCIe 5.0 slots connected to the CPU (x16 or x8/x8 or x8/x4/x4), which I thought would be great for running dual GPUs for LLM inference. The problem is that now I want to add a SATA expansion card to the bottom PCIe slot, but this will drop the middle slot to x4 speeds. Would I see a performance hit for inference if I run the two GPUs in x8/x4 mode, both when the model if fully loaded into VRAM and when I have to use partial offloading?
[link] [comments]
More from r/LocalLLaMA
-
What's in your RAG?
Jul 2
-
Palantir CEO rages against closed models
Jul 2
-
A cheap trick for reliable structured output: feed the validation error back into the retry
Jul 2
-
SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing.
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.