r/LocalLLaMA
500 articles archived · Visit source ↗ · RSS
-
-
-
r/LocalLLaMA community 4h ago
They fit! Mostly.... 2x 3090, Thermaltake Core p3
Got another 3090 had to print a bracket to angle the radiator and make room for the GPUs 💀 ended up liking the look more than I thought ..qwen 27b go brrrrr   submitted by   /u/anthonyg45157 [link]   [comments]
6 -
r/LocalLLaMA community 5h ago
Making LLMs Better at Creative Writing using Entropy
  submitted by   /u/CountBayesie [link]   [comments]
26 -
-
-
-
r/LocalLLaMA community 11h ago
ZCode: New Agentic Code Editor from the Makers of GLM
  submitted by   /u/johnnyApplePRNG [link]   [comments]
16 -
r/LocalLLaMA community 14h ago
Deepseek Flash V4 at IQ2 or Qwen 3.6 27B Q5KM ? Any tests or benchmarks ?
Deepseek Flash V4 at IQ2 or Qwen 3.6 27B Q5KM ? Any tests or benchmarks ? Wondering which one would be better at speed / coding / reasoning   submitted by   /u/soyalemujica [link]   [comments]
32 -
-
r/LocalLLaMA community 15h ago
How to improve RAM offload?
I have only 12GB VRAM (RTX3060) but have enough RAM to run Qwen3.6 27B Q4 with offload. Something tells me that it won't achieve maximum performance but why DRAM speed is only around 30GB/s (HWiNFO data) during inference with dual channel 5200 RAM? TG is 3.12 tok/sec with 18K…
38 -
r/LocalLLaMA community 15h ago
Couldn't hold back
Had been waiting for months and the cards finally got delivered today. No one at my workplace was excited, maybe because no one cares for AI stuff that i work on. But I just wanted to share it with you guys. Can't wait to build the server and start working on them.  …
11 -
r/LocalLLaMA community 16h ago
Open Models - June 2026
After overwhelming April , OK May , here's June. Yeah, Graph has only less items. Because we got other items here last month. Finetunes : Nex-N2 Ornith-1.0 Agents-A1 Holo3.1 Tmax-27b MusaCoder-27B VibeThinker-3B NVFP4 from NVIDIA for below models :…
8 -
-
-
r/LocalLLaMA community 18h ago
Agent execution visualizer
I've seen projects which stream tool use status and subagent generation, and represented it with a nice little visual based on the tool being used, etc. It would be pretty cool to pair this with some live model visualisations like a QKV heatmap across attention heads. Not for…
28 -
r/LocalLLaMA community 18h ago
Deepseek V4 Flash 2, 3 and 4 bits GGUFs
  submitted by   /u/tarruda [link]   [comments]
31 -
r/LocalLLaMA community 18h ago
Best tps can I get with Qwen3.5 122B on 32GB VRAM + 64GB RAM?
My attempt at running Qwen3.5 122B on my 5090 (32GB VRAM) + 64GB RAM is really bleak. I'm getting a speed that starts at 6 tps and ends at ~20 tps. Can I improve this further? build/bin/llama-server \ -m…
21 -
-
-
-
r/LocalLLaMA community 1d ago
Claude Code Is Steganographically Marking Requests
  submitted by   /u/johnnyApplePRNG [link]   [comments]
21 -
r/LocalLLaMA community 1d ago
Is there an alternative to C-Payne for 100-lane PCIe 5.0 switches? Needed for 8-GPU build.
Sadly Christian is on vacation or something, which is a shame because the C-Payne PCIe gear is the best around. In the meantime I need this to add some urgent compute capacity:…
15 -