r/LocalLLaMA · June 30, 2026 · 3 min read

EPYC hybrid system benches and optimal CPU

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Finally I've built my semi-budget setup, tho not everything went as I expected. Firstly, I purchased EPYC 9555 QS, but was scammed and CPU arrived dead. That time I was only able to afford placeholder 9135 with 2 CCD.

That's why I'm interested in inference numbers of people who bought proper cpu. Everyone talks that 16 CCD and less cores is the best choice (9175f), but based on my research difference is not so big. Otherwise I saw comment that someone benched GLM-5.2 on 9684x (cpu only) and scored 12t/s. My setup's cpu only got me around 7t/s. I've also heard that 9555 would be better than 9355 in some github thread.

https://openbenchmarking.org/ contains only small models benches.

My setup:
768 DDR5 4800, EPYC 9135, RTX 5090

Test command (ik_llama and Ubergarm/Kimi-K2.6 Q4_X):
./llama-sweep-bench \

--model Ubergarm/Kimi-K2.6-Q4_X-00001-of-00014.gguf \

--no-mmap --merge-qkv \

-mla 3 -amb 512 \

-b 4096 -ub 4096 \

-ctk f16 -ctv f16 -c 32000 \

-ngl 999 -ncmoe 999 \

--threads 16 \

--threads-batch 28 \

--warmup-batch \

-n 128

Numbers: b 4096
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |

|-------|--------|--------|----------|----------|----------|----------|

| 4096 | 128 | 0 | 15.701 | 260.87 | 7.168 | 17.86 |

| 4096 | 128 | 4096 | 16.128 | 253.96 | 7.260 | 17.63 |

| 4096 | 128 | 8192 | 16.296 | 251.35 | 7.457 | 17.16 |

| 4096 | 128 | 16384 | 17.006 | 240.86 | 7.519 | 17.02 |

| 4096 | 128 | 32768 | 18.397 | 222.65 | 7.845 | 16.32 |

| 4096 | 128 | 65536 | 20.240 | 202.37 | 8.298 | 15.43 |

Numbers: b 8192

| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |

|-------|--------|--------|----------|----------|----------|----------|

| 8192 | 128 | 0 | 18.564 | 441.28 | 7.081 | 18.08 |

| 8192 | 128 | 8192 | 20.323 | 403.10 | 7.405 | 17.29 |

| 8192 | 128 | 16384 | 21.115 | 387.96 | 7.525 | 17.01 |

Previous 4090 numbers:

| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |

| 4096 | 128 | 0 | 19.716 | 207.75 | 7.269 | 17.61 |

| 4096 | 128 | 4096 | 20.324 | 201.54 | 7.379 | 17.35 |

| 4096 | 128 | 8192 | 20.717 | 197.71 | 7.512 | 17.04 |

I've also found numbers for 6400 DDR5 and EPYC 9355:

PP	TG	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s
4096	128	0	14.985	273.35	6.326	20.24
4096	128	4096	15.316	267.44	6.453	19.83
4096	128	8192	15.662	261.52	6.614	19.35
4096	128	16384	16.399	249.77	6.719	19.05
4096	128	32768	17.656	231.98	6.989	18.31
4096	128	65536	20.666	198.20	8.107	15.79

Other setup for the same ik_llama and Kimi-K2.6 Q4_X: EPYC 9175F and RTX 6000 Pro:

For 17.9 to 21 t/s range, and PP cold in the 223 to 377 t/s

submitted by /u/iVoider
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA