r/LocalLLaMA · · 1 min read

SWE-rebench leaderboard update: GLM-5.2, Qwen3.6-27B, Qwen3.6-35B-A3B, Gemma 4 31B and more + improved UI

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

SWE-rebench leaderboard update: GLM-5.2, Qwen3.6-27B, Qwen3.6-35B-A3B, Gemma 4 31B and more + improved UI

Hi all,

We made several updates to the SWE-rebench leaderboard: added new models, refreshed recent results, and reworked the leaderboard UI to make results easier to read, compare, and understand.

New Models:

  • Claude Opus 4.8 xhigh: 56.5% — 2.48M tokens
  • GLM-5.2: 51.1% — 2.62M tokens
  • Gemini 3.5 Flash: 49.5% — 1.85M tokens
  • MiniMax M3: 45.6% — 6.89M tokens
  • DeepSeek-V4 Pro: 42.7% — 2.25M tokens
  • MiMo V2.5 Pro: 42.4% — 2.59M tokens
  • DeepSeek-V4 Flash: 38.4% — 3.00M tokens
  • Qwen3.6-27B: 36.5% — 1.88M tokens
  • Qwen3.6-35B-A3B: 33.8% — 2.23M tokens
  • Gemma 4 31B: 16.5% — 2.24M tokens

For r/LocalLLaMA, the most interesting part is probably the local / self-hosted model results. Qwen3.6-27B is quite strong for its size, while Qwen3.6-35B-A3B and Gemma 4 31B are also now on the board for comparison.

Which local models should we test ? Let us know which ones you use for coding agents or local development, and we’ll consider adding them in future updates.

Links:

> Leaderboard: https://swe-rebench.com/

> Our discord: https://discord.gg/V8FqXQ4CgU

> X post with the update: https://x.com/ibragim_bad/status/2072318238407483593?s=20

> Harbor (If you want to run Agent on your own) : https://hub.harborframework.com/datasets/swe-rebench/swe-rebench-leaderboard/latest

submitted by /u/Fabulous_Pollution10
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA