r/LocalLLaMA · June 21, 2026 · 1 min read

Finally seeing benefits of MTP after removing GGML_CUDA_ALLREDUCE

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Been fighting this a while, mtp seeing lows at 17 to sometimes 30's and today I went and dug deep and tried so many different configuartions, cmake remakes, you name it. After it all I finally tried removing GGML_CUDA_ALLREDUCE and I finally saw a nice uplift in tps!

Just posting in case anyone see this and find themselves in a similar situation. Didn't occur to me to remove that envar because it's usually considered benficial but once I removed it, whammo!

https://imgur.com/a/obaIkVy

submitted by /u/Bulky-Priority6824
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA