r/LocalLLaMA · June 5, 2026 · 1 min read

sycl : port multi-column MMVQ from CUDA backend (~45% speculative decoding speedup on Intel Arc) by masonmilby · Pull Request #21845 · ggml-org/llama.cpp

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Saw this on other sub so posting here.

For Intel ARC card holders. Big boost so update llama.cpp version(b9519 onwards)

Discussion (0)

No comments yet. Sign in and be the first to say something.