OpenMythos benchmarks
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| Hey everyone! OpenMythos benchmarks are finally here sorry it took about a week to post these. The delay was mainly because SWE-bench results weren't matching up with Qwen 3.6 27B official numbers. Turns out Qwen used a different eval harness and also refined/filtered the benchmark problems, even there prev 3.5 (72.4 in SWE Verified ) version benchmark score is not matching with the numbers published in 3.6 (75 in SWE Verified). Anyway, here are the results across SWE-bench Pro, CyberGym, and cybench. Also huge thanks to u/giveen for Demo: https://huggingface.co/spaces/build-small-hackathon/OpenMythos Model: https://huggingface.co/build-small-hackathon/OpenMythos [link] [comments] |
More from r/LocalLLaMA
-
What's in your RAG?
Jul 2
-
Palantir CEO rages against closed models
Jul 2
-
A cheap trick for reliable structured output: feed the validation error back into the retry
Jul 2
-
SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing.
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.