r/LocalLLaMA · June 19, 2026 · 1 min read

How do I set the right llama.cpp parameters?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

--n-gpu-layers all --ctx-size 0 --reasoning-budget 0 --presence-penalty 1.1 --repeat-penalty 1.1

How do I figure out the optimal llama.cpp parameters for my setup? llama.cpp + Open WebUI in Docker with an AMD GPU (16GB VRAM) running gemma 4 12b and 26b models.

Is it all about trial and error? Are there more materials I can study to learn beyond the llama.cpp docs?

Google provides recommended settings for temp (1.0), top-p (0.95), and top-k (64). Asking my LLM gives inconsistent results, so I'm looking for better recommendations from others.

submitted by /u/x6q5g3o7
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA