Qwen3.6:27B VRAM 16GB 5080: MTP Quant, Speeds, and Configs
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
For those of you running Qwen3.6:27B on 16GB VRAM, what quantization did you settle on?
For my primary purpose as a HA voice assistant, I've found my ideal target to be >50 tg and >800 pp. Qwen3.5:9B works really fast, but I'm experimenting with higher intelligence. Offloaded the vision model to CPU because it is infrequently used.
Currently running Qwen3.6-27B-Q3_K_S.gguf with 64 layers on GPU at the following speeds:
prompt eval time = 462.66 ms / 507 tokens ( 0.91 ms per token, 1095.83 tokens per second) eval time = 18710.17 ms / 884 tokens ( 21.17 ms per token, 47.25 tokens per second) total time = 19172.84 ms / 1391 tokens draft acceptance rate = 0.59677 ( 481 accepted / 806 generated) prompt eval time = 6001.34 ms / 8561 tokens ( 0.70 ms per token, 1426.51 tokens per second) eval time = 2404.46 ms / 147 tokens ( 16.36 ms per token, 61.14 tokens per second) total time = 8405.80 ms / 8708 tokens draft acceptance rate = 0.80357 ( 90 accepted / 112 generated) Config:
-m /models/Qwen3.6-27B/Qwen3.6-27B-Q3_K_S.gguf --mmproj /models/Qwen3.6-27B/mmproj-BF16.gguf --no-mmproj-offload --host 0.0.0.0 --port 8080 --jinja -fa on --temp 0.6 --top-p 0.95 --top-k 20 --min_p 0.0 --presence-penalty 1.5 --repeat-penalty 1.0 --cache-ram 0 --fit on -np 2 --fit-ctx 32000 --cache-type-k q8_0 --cache-type-v q8_0 --cache-type-k-draft q8_0 --cache-type-v-draft q8_0 --log-verbosity 4 --chat-template-kwargs '{"preserve_thinking": true}' --spec-type draft-mtp --spec-draft-n-max 2 [link] [comments]
More from r/LocalLLaMA
-
Local benchmarks with a RTX 3090 - Qwen3.6 27b vs Ornith
Jul 2
-
It's officially over. One of the fathers of AI at Nvidia doesn't believe in AGI and compares OpenAI and Anthropic's closed models to AOL and Prodigy's closed internets. Says the future is every business having a customized open source model.
Jul 2
-
6x P40 running Minimax M2.7_Q3_XL
Jul 2
-
Fine-tuned Gemma-4-31B specifically for Copywriting & Creative Writing Tasks (Scored +290 Elo over base using EqBench3)
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.