The Eagle(3) has landed (for Qwen)
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
https://github.com/ggml-org/llama.cpp/releases/tag/b9723
Available in the latest release. Enabled via:
--spec-type draft-eagle3
You'll need to feed it a draft model. There's issues with unsloth + eagle at the moment so I've personally tested against:
Model: https://huggingface.co/lmstudio-community/Qwen3.6-27B-GGUF
Draft: https://huggingface.co/wimmmm/Ex0bit-Qwen3.6-27B-PRISM-EAGLE3-GGUF
Specify your draft with -md or --model-draft
Performance wise, I currently get very similar tps to draft-mtp. Also tensor parallelism isn't currently supported and asserts out, which I rely on a lot. The draft model will also eat a bit of vram, so not the best if you're running a very tight setup. I'll be keen to see how this develops in time!
Don't forget you can also stack up multiple types of speculative decoding:
--spec-type draft-eagle3,ngram-mod
[link] [comments]
More from r/LocalLLaMA
-
What's in your RAG?
Jul 2
-
Palantir CEO rages against closed models
Jul 2
-
A cheap trick for reliable structured output: feed the validation error back into the retry
Jul 2
-
SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing.
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.