GLM 5.2: 98% of max level intelligence with less than half of tokens usage
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| According to this number of reasoning tokens from GLM 5.1 to GLM 5.2 more than doubled from 16.7k to 36.7k and for me as a local user with old junk Xeon setup this makes GLM 5.2 unusable to the extent where I had to shut down model after 12h of waiting it to respond to my math problem question. But then I saw this graph from z_ai technical report, which basically implies that you can use less than half of the tokens of max effort on high level and still get around 98% of max level intelligence at least in coding tasks. So I encourage both local and API users to try high level, because by default GLM 5.2 is set to max level. [link] [comments] |
More from r/LocalLLaMA
-
Palantir CEO rages against closed models
Jul 2
-
A cheap trick for reliable structured output: feed the validation error back into the retry
Jul 2
-
SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing.
Jul 2
-
[Benchmark] Kimi K2.7 Code Q3 on Mac Studio M3 Ultra + RTX PRO 6000 over llama.cpp RPC: prefill improves, no changes in token generation/decode
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.