r/LocalLLaMA · June 30, 2026 · 1 min read

MTP-only GGUF subsets: Qwen3.5/3.6

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

They are just MTP-only GGUF subsets of Qwen3.5/3.6 Medium/Large (27B and above) models (to accelerate token generation of Qwen-based models without MTP tensors).

But I hope they help experimenting with various Qwen3.5/3.6-based fine-tunes.

The reason I originally created some of these MTP-only subsets was to accelerate token generation of trohrbaugh/Qwen3.5-122B-A10B-heretic (self-converted version) but the main reason I published them is Ornith-1.0-35B.

To show exactly how Qwen3.5/3.6's MTP tensors can be embedded inside an existing GGUF file (and making them easy)
I recently found that one of the Ornith-1.0-35B quants embed MTP tensors stating that it's from Qwopus3.6-35B-A3B and... their MTP tensors are just from original Qwen's.
To make MTP-only models with dual uses (1. separate draft model file / 2. model file for grafting) available
Some MTP-only subsets (in GGUF format) are small but only for grafting (i.e. transplanting MTP-related tensors) and cannot be used as a separate draft model file (which llama.cpp supports; --model-draft on llama-server). I hope that publishing easy-to-test model files makes experimenting with Qwen3.5/3.6-based fine-tunes easier.

Hope that they help someone.

Edit (2026-07-01): MTP-only GGUF subset of Qwen3.5-9B is added (since there's many fine-tunes based on this model; there's no plan for 4B or smaller).

submitted by /u/a4lg
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA