r/LocalLLaMA · June 12, 2026 · 1 min read

Open Dungeon: local roleplay with Gemma 4 QAT + inline Uncen-FLUX images, running at full 256K context under 8GB RAM (OS)

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Open Dungeon: local roleplay with Gemma 4 QAT + inline Uncen-FLUX images, running at full 256K context under 8GB RAM (OS)

I wanted AI Dungeon but fully local and actually private, so I built it. The narrator is Gemma 4 (QAT Q4) through Ollama, and when a scene is worth showing it draws the picture too, locally, with FLUX. No API keys, no cloud, nothing leaves your machine.

The part that surprised me: you can run the 12B at its full 256k context and it still only sits around 7.7GB of RAM, because Gemma 4 barely grows the KV cache. So the narrator can basically hold the whole story in its head. Old scenes that do scroll out get folded into a running summary so it never forgets what happened in chapter one.

It plays like you would expect: Do / Say / Story modes, Continue, Retry, Erase, edit any line. Pick your model in the UI and it shows you the RAM cost up front.

Mac one-click build in releases, or run from source. MIT, would love for people to break it and tell me what is missing.

https://github.com/newideas99/open-dungeon

submitted by /u/akroletsgo
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA