Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard!
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard!
little-coder × Qwen3.6-35B-A3B hit 24.6% (±3.2), and now land above Gemini 2.5 Pro on Gemini CLI (19.6%) and Qwen3-Coder-480B on Terminus 2 (23.9%). I didn’t expect the scaffold-model gap from Polyglot to hold on a benchmark this hard but it did!
little-coder × Qwen3.5-9B came in at 9.2% which is more humble. Yet, it also shows again that sub-10B local models are now measurable on a hard agentic benchmark, not assumed unworthy of a slot.
Just felt it was right to follow up here as you requested, and say a genuine thanks to this community. It really is the place currently driving innovation toward less compute, and this run exists there because you pushed for it.
Now it’s time to head for the top of the leaderboard 👀 let’s go open source!
[link] [comments]
More from r/LocalLLaMA
-
Local benchmarks with a RTX 3090 - Qwen3.6 27b vs Ornith
Jul 2
-
July 4th is coming up, is there any vision model that's good for picking up fire?
Jul 2
-
It's officially over. One of the fathers of AI at Nvidia doesn't believe in AGI and compares OpenAI and Anthropic's closed models to AOL and Prodigy's closed internets. Says the future is every business having a customized open source model.
Jul 2
-
6x P40 running Minimax M2.7_Q3_XL
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.