Some llama.cpp B70 SYCL benchmarks
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
build: dd4623a74 (9640)
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gemma4 12B Q8_0 | 11.78 GiB | 11.91 B | SYCL | -1 | pp512 | 1578.19 ± 7.82 |
| gemma4 12B Q8_0 | 11.78 GiB | 11.91 B | SYCL | -1 | tg128 | 32.43 ± 0.07 |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gemma4 26B.A4B Q8_0 | 25.00 GiB | 25.23 B | SYCL | -1 | pp512 | 1332.35 ± 8.80 |
| gemma4 26B.A4B Q8_0 | 25.00 GiB | 25.23 B | SYCL | -1 | tg128 | 40.13 ± 0.09 |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gemma4 E2B Q8_0 | 4.69 GiB | 4.65 B | SYCL | -1 | pp512 | 5662.45 ± 23.05 |
| gemma4 E2B Q8_0 | 4.69 GiB | 4.65 B | SYCL | -1 | tg128 | 109.14 ± 0.26 |
| model | size | params | backend | ngl | ot | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------------- | --------------: | -------------------: |
| qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | SYCL | 99 | blk\.(3[4-9])\.ffn_(gate|up|down)_exps=CPU | pp512 | 563.48 ± 14.58 |
| qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | SYCL | 99 | blk\.(3[4-9])\.ffn_(gate|up|down)_exps=CPU | tg128 | 44.67 ± 0.04 |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen35 27B Q8_0 | 27.04 GiB | 27.32 B | SYCL | -1 | pp512 | 778.20 ± 0.99 |
| qwen35 27B Q8_0 | 27.04 GiB | 27.32 B | SYCL | -1 | tg128 | 15.42 ± 0.01 |
Just fyi. It runs Ok, but it could be better.
[link] [comments]
More from r/LocalLLaMA
-
Palantir CEO rages against closed models
Jul 2
-
SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing.
Jul 2
-
[Benchmark] Kimi K2.7 Code Q3 on Mac Studio M3 Ultra + RTX PRO 6000 over llama.cpp RPC: prefill improves, no changes in token generation/decode
Jul 2
-
They fit! Mostly.... 2x 3090, Thermaltake Core p3
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.