Deepseek V4 Flash running on RTX 5090 MoE
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| Here is the results of optimizing it for my setup: My setup: This was possible using this fork: https://github.com/fairydreaming/llama.cpp/tree/dsv4 Build script: Benchmark command: Daily use command: Yes, 1 million context, it fits with ub 512, and there's even a little bit of VRAM left to utilize. You can even fit in --n-cpu-moe 37 or 36 if you're really lean on your OS. Thanks to u/tarruda for the Q2_K model and helping digging into all the fixes in order to get this going! [link] [comments] |
More from r/LocalLLaMA
-
Qwen3.6-27b-mtp-q8 successfully created an A* pathfinding implementation on a test game built in Java from scratch.
Jul 4
-
This 3 slot 3080 20GB with 12v2x6 I got for €422,45
Jul 3
-
Particle Scattering Sampler for llama.cpp
Jul 3
-
gemma4 e2b is really good, what other small models work on crappy computers?
Jul 3
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.