Open Dungeon: local roleplay with Gemma 4 QAT + inline Uncen-FLUX images, running at full 256K context under 8GB RAM (OS)
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| I wanted AI Dungeon but fully local and actually private, so I built it. The narrator is Gemma 4 (QAT Q4) through Ollama, and when a scene is worth showing it draws the picture too, locally, with FLUX. No API keys, no cloud, nothing leaves your machine. The part that surprised me: you can run the 12B at its full 256k context and it still only sits around 7.7GB of RAM, because Gemma 4 barely grows the KV cache. So the narrator can basically hold the whole story in its head. Old scenes that do scroll out get folded into a running summary so it never forgets what happened in chapter one. It plays like you would expect: Do / Say / Story modes, Continue, Retry, Erase, edit any line. Pick your model in the UI and it shows you the RAM cost up front. Mac one-click build in releases, or run from source. MIT, would love for people to break it and tell me what is missing. [link] [comments] |
More from r/LocalLLaMA
-
Palantir CEO rages against closed models
Jul 2
-
SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing.
Jul 2
-
[Benchmark] Kimi K2.7 Code Q3 on Mac Studio M3 Ultra + RTX PRO 6000 over llama.cpp RPC: prefill improves, no changes in token generation/decode
Jul 2
-
They fit! Mostly.... 2x 3090, Thermaltake Core p3
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.