r/LocalLLaMA · June 21, 2026 · 1 min read

ROCm vs Vulkan vs vLLM on Dual R9700's

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Just wanted to share these numbers I saw running Qwen3.6 35BA3 and Qwen3.6 27B and the big increase I saw going to vLLM. I was just expecting better concurrency but ended up with a lot better speeds.

llama.cpp services Running ROCm and Vulkan

Model	Backend	Gen
35B-A3B Q6_K_XL (MTP)	ROCm	~106 t/s
27B Q6_K_XL (MTP)	ROCm	~44 t/s
35B-A3B Q6_K_XL (MTP)	Vulkan	~87 t/s
27B Q6_K_XL (MTP)	Vulkan	~41 t/s

vLLM

Model	Backend	Gen
35B-A3B MoE FP8 (MTP)	ROCm + AITER	156 t/s
27B FP8 (MTP)	ROCm + AITER	69 t/s

**EDIT, here are prefill speeds since several were asking:

Pulled these from vLLM logger.

Prompt size	Prefill speed	(= tokens ÷ TTFT)

~10K	~10,000 tok/s	10,033 ÷ 0.98s
~40K	~6,600 tok/s	39,997 ÷ 6.0s
~70K	~5,500 tok/s	70,027 ÷ 12.7s
~100K	~4,400 tok/s	99,991 ÷ 22.9s

I am curious what speeds others are seeing on Qwen3.6 35BA3 and 27B.

submitted by /u/whodoneit1
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA