Qwen 3.6 benchmarks on 2x RTX PRO 6000
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Got a chance to play around with 2x RTX PRO 6000 setup so sharing some number for Qwen 3.6.
All these were run using latest stable VLLM backend. This was for a personal project.
Qwen 3.6 27B BF16 (Original without any quantization)
------
MTP - Off | 64 concurrency | 1600 tps generation
MTP - 2 | 32 concurrency | 1400 tps generation
MTP - 2 | 64 concurrency | 1800 tps generation
------
Qwen 3.6 35B BF16
MTP - Off | 64 concurrency | 2700 tps generation
MTP - Off | 128 concurrency | 3500 tps generation (Prompt Processing 30,000 tps)
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.