GLM-5.2 benchmarked on DeepSWE: Beats Gemini & GPT-5.4, but the token volume/cost makes it wildly inefficient? (Theo - t3.gg)
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Saw this breakdown from Theo (t3.gg) on X showing the latest DeepSWE leaderboard stats for the new GLM-5.2 open-weight model.The good news: it's officially surpassing GPT-5.4 and the entire Gemini lineup in raw coding capability. Seeing an open-weight model punch that high is incredibly dope.The catch? It is not cheap to run.According to the chart:GPT-5.5 (medium) and Claude Opus 4.8 (high) are both cheaper and smarter on an average cost-per-task basis.GLM-5.2 is sitting far lower on the efficiency curve despite its open-weight status.Theo points out a massive caveat in the replies: GLM-5.2 apparently uses way more output tokens. So even if the baseline token cost looks cheap on paper, the sheer volume of tokens required to complete a task drives the total cost way up.
[link] [comments]
More from r/LocalLLaMA
-
What's in your RAG?
Jul 2
-
Palantir CEO rages against closed models
Jul 2
-
A cheap trick for reliable structured output: feed the validation error back into the retry
Jul 2
-
SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing.
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.