I built a coding agent that gets 87% on benchmarks with a 4B parameter model, here's how
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| I was frustrated that every coding agent (OpenCode, Cursor, Claude Code) assumes you're running GPT-5.4 or Claude Opus. If you try them with a local model like Gemma or Qwen they fall apart. I find that often tool calls fail, context overflows, multi-step tasks collapse. So I built SmallCode. It's designed from the ground up for small local models. The result: 87/100 benchmark tasks pass with a Gemma 4 model that only activates 4B parameters per token. OpenCode scores ~75% with 14B models. The harness does the heavy lifting, not the model size. How it works (the tricks that make small models reliable):
What it looks like: Full-screen terminal UI (like OpenCode/vim), scrollable chat, command palette with What it doesn't do:
Install: Point it at LM Studio, Ollama, or any OpenAI-compatible endpoint. MIT licensed, everything's on GitHub: https://github.com/Doorman11991/smallcode Happy to answer questions about the architecture or benchmark methodology. [link] [comments] |
More from r/LocalLLaMA
-
What's in your RAG?
Jul 2
-
Palantir CEO rages against closed models
Jul 2
-
A cheap trick for reliable structured output: feed the validation error back into the retry
Jul 2
-
SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing.
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.