News / #edge Tag Edge 208 articles archived under #edge · RSS Sign in to follow r/LocalLLaMA community 11d ago Watch local LLMs escape the rooms you design Hello! I'd like to share my repo for WATCH MY ESCAPE: https://github.com/cjami/watch-my-escape It's an inverted escape room game where you design the maps and LLMs have to try to escape them. It uses traditional action verbs (e.g. push, pull, pick-up) to interact with the… 34 r/LocalLLaMA community 11d ago What are people doing with their local models and what tools do you use them with? I am trying to come up with some more uses for my DGX Sparks. Curious which tools work best for things like coding as well. What do you use instead of things like the claude.ai web interface? I have played with OpenWebUI but it just doesn't seem as capable without a lot of… 31 r/LocalLLaMA community 11d ago It’s time to decentralize model distribution! Introducing Noema Atlas TL;DR: Noema Atlas is a peer-to-peer network software using Iroh for local LLM weights, free and open source (Apache-2.0). Models come from whichever peers have them, with Hugging Face and mirrors as fallback (opt-in). Every file is identified by its content hash and a signed… 38 r/LocalLLaMA community 11d ago You can now convert EXL3 quants on Apple Silicon Mac Hi, I'm here with an update. But this time it's quite a bigger news on local llm. Normally accessing the high fidelity quant like EXL3 is CUDA gated, and imagine you need 96GB-128GB with RTX cards, they are very specialized and expensive. But now on a more general basis, MacOS… 38 r/LocalLLaMA community 11d ago Best local LLM for English story summarization Hello, which local LLM is currently the best at story summarization? The stories can be multiple pages long and are in English. Thanks!   submitted by   /u/DesperateGame [link]   [comments] 24 r/LocalLLaMA community 12d ago Improving local models with an API based "consultant"? I'm sure that someone else has come up with this before, but i just wanted to ask: Has it occurred to anyone to improve their local AI workflow by adding a more powerful API based "consultant" agent (GLM 5.2 now springs to mind) to call upon for refining plans, learnings and… 35 r/LocalLLaMA community 12d ago Is my CPU and RAM too weak/ lees for local LLMs? Both are going 100% for simple test prompts. GPU is not getting used fully. In theory quen3.5:9b should fit and run on RTX3050 8 GB comfortably. https://preview.redd.it/i69vee9mi88h1.png?width=1592&format=png&auto=webp&s=820720e8a3e1d5386d49119a235e2902acc13265 I am very new to this local llm world. Just started to exploring from past 3days. Share any troubleshooting tips.   submitted by   /u/mr_whoisGAMER [link]… 12 arXiv — Machine Learning research 13d ago Low-Energy Reduced RISC-V Instruction Subset Processor for Tsetlin Machine Inference at the Edge arXiv:2606.19964v1 Announce Type: new Abstract: Tsetlin Machine (TM) is a logic-based machine learning approach that relies on simple bitwise operations and finite-state automata, which makes it attractive for edge AI deployments. Recent work has focused on co-processor and… 23 r/LocalLLaMA community 13d ago gave my local llm agent mcp tools for local image + video gen, so it just generates when i ask (fully offline+free) free and open source, runs fully offline. the local llm agent does the image and video gen itself via mcp tools. details and github in the comments.   submitted by   /u/GroundbreakingMall54 [link]   [comments] 33 r/LocalLLaMA community 14d ago Lemonade v10.8: auto memory management, cloud offload, Omni improvements, and call your local models as MCP tools v10.8 is out, so here's a project update on what landed. This was a 20-contributor release in just 7 days! Smarter memory and context management Dynamic VRAM management now auto-unloads idle models and downsizes their KV-cache to reclaim GPU memory on the fly, plus model pinning… 27 r/LocalLLaMA community 14d ago I released a local LLM-powered RPG where generated NPCs, locations, items, and quests persist as in-game objects In this game, NPCs, locations, items, quests, and other elements are generated not as one-off text, but as persistent in-game objects. The LLM handles dialogue, narration, situational interpretation, quest progression, and similar parts of the experience. Meanwhile, the game… 19 r/LocalLLaMA community 15d ago Local models went from mostly useless to actually useful really fast. What changed? https://preview.redd.it/knc4ht7bft7h1.png?width=1048&format=png&auto=webp&s=49abdb8b0f358e799ecb06aa49134d9b0fd49336 Mitchell Hashimoto had a good point earlier: local models went from basically useless to actually useful in what feels like one year. I think thats pretty… 5 arXiv — Machine Learning research 15d ago AnchorKV: Safety-Aware KV Cache Compression via Soft Penalty with a Refusal Anchor arXiv:2606.17872v1 Announce Type: new Abstract: Large language models (LLMs) outperform earlier architectures on generative inference and long-context tasks, but their large size introduces significant challenges in memory usage, energy cost, and on-device deployment. Since… 27 r/LocalLLaMA community 15d ago Hashicorp founder thinks local models "aren't good ENOUGH yet" Generally, respect him a lot, but this is a wrong take. More than 1 year ppl are doing alright using SLMs for coding; only vibecoders might struggle Link   submitted by   /u/Orbit652002 [link]   [comments] 24 NVIDIA Developer Blog official-blog 15d ago Build On-Device AI Companions with the NVIDIA ACE Game Agent SDK and Unreal Engine 5 Plugins NVIDIA RTX technologies are deeply integrated into Unreal Engine 5 through the NVIDIA RTX Branch of Unreal Engine and the NVIDIA DLSS Unreal Engine plugin. This... 23 Simon Willison community 15d ago Quoting Georgi Gerganov I can 100% attest to the fact that Qwen3.6-27B is a very capable local model for coding tasks. Over the last month and a half I've been using it almost daily, either on my M2 Ultra or on my RTX 5090 box. I use it for small mundane tasks at ggml-org - nothing really impressive,… 9 Hacker News — AI on Front Page community 15d ago Running local models is good now Article URL: https://vickiboykis.com/2026/06/15/running-local-models-is-good-now/ Comments URL: https://news.ycombinator.com/item?id=48555993 Points: 299 # Comments: 159 18 r/LocalLLaMA community 16d ago Are small local models for automation a thing? I’ve been following this sub for a while, and it feels like the massive hype is always around having a local vibe coding assistant or trying to run heavy, near-frontier models locally, and that’s amazing. But I feel like we are overlooking a massive use case, for me, an… 5 r/LocalLLaMA community 16d ago I made a game where you convince an AI model that reality is a simulation. Progress update: Showed you all my demo last week, had some great conversations with some very smart folk, and spent days fixing bugs and trying things out. And now, I humbly present to you: Simulation Simulator! A chat simulator game that bundles a local LLM inside Unity, and… 5 Hacker News — AI on Front Page community 16d ago Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding? Has anyone here fully swapped Claude/GPT for a local model as their main coding tool, not just for side experiments? If so, please share your setup and performance (e.g tok/s) Comments URL: https://news.ycombinator.com/item?id=48542100 Points: 510 # Comments: 255 23 r/LocalLLaMA community 16d ago archex: local-first, deterministic code-context for AI agents — no API key, no telemetry (Apache 2.0) archex turns a repo into a ranked, token-budgeted context bundle for coding agents: the symbols, imports, dependency-graph neighbors, and provenance the model needs, assembled before it reasons. It returns context, not an answer — your local model still does the thinking. The… 24 arXiv — Machine Learning research 17d ago Efficient On-Device Diffusion LLM Inference with Mobile NPU arXiv:2606.13740v1 Announce Type: new Abstract: Diffusion large language models (dLLMs) accelerate generation by denoising multiple tokens in parallel, making them attractive for latency-sensitive mobile inference. However, repeated denoising introduces substantial computation… 35 arXiv — Machine Learning research 17d ago Federated Learning for Feature Generalization with Convex Constraints arXiv:2606.14416v1 Announce Type: new Abstract: Federated learning (FL) often struggles with generalization due to heterogeneous client data. Local models are prone to overfitting their local data distributions, and even transferable features can be distorted during aggregation.… 12 r/LocalLLaMA community 17d ago Made a macOS app that creates highly personal macOS apps. Works with models as small as Gemma 4 E2B Apologies in advance as the video is demonstrating with GPT 5.4 mini (a local model would take too long for a video), however I’ve made the same app with Gemma 4 E4B. Been working on an open source project for a while called Ironsmith. The gist is you can create highly… 13 r/LocalLLaMA community 17d ago Help with resources for using LLMs as fictional characters Hey ya'll, I'm an ex-cognitive scientist turned NLP Data Scientist by day, and science fiction author by night. I want to bring fictional characters in my prose to life with Local LLMs, and I'm looking for the best resources out there for doing this kind of work (datasets,… 10 r/LocalLLaMA community 18d ago Local models in mid-2026 Open weights got close enough to run at home this year, not by needing more RAM but the reverse: sparse attention, MoE, latent KV compression, multi-token prediction and four-bit quant.   submitted by   /u/mattjcoles [link]   [comments] 11 r/LocalLLaMA community 18d ago Build for local LLM with 2 separate GPUs I want to build a headless compute machine to run a RTX Ada 4000 (20GB) with a RTX Pro 5000 (48GB) or RTX PRO 4500 (32GB) in parallel for inference. The goal is not running one large model using 2x GPUs, but rather running separate models on each GPU. Why these GPU config?… 19 r/LocalLLaMA community 18d ago I don’t know who needs to hear this but 128GB BD-R XL M-DISC is SOTA for consumer-available archival optical storage (for backing up your models) If you’re trying to download and preserve your local LLMs in case of future availability issues due to AI-related politics, your best bet is either 128gb or 100gb Blu-Ray optical disks, more specifically BD-R XL M-DISC standard format which are archival-grade and built to last… 21 r/LocalLLaMA community 18d ago In your opinion, what is the best CLI-based (or other) coding tool for regular software engineering (NOT VIBE CODING)? This includes but is not only limited to: OpenCode, Command Code, Kilo Code, Cline, Claude Code, etc. Please try to include tools in which I can connect local models, so not stuff like Antigravity.   submitted by   /u/Potential_Top_4669 [link]   [comments] 36 r/LocalLLaMA community 19d ago We should set up a torrent network for open source models. Was just thinking about this due to recent events. Hugging Face is a US-based company, legally incorporated as Hugging Face, Inc. with its official headquarters located in Brooklyn, New York. It seems like a pretty big single point of failure for local models. Maybe a… 23 r/LocalLLaMA community 19d ago Anthropic forced to abruptly disable Fable 5 & Mythos 5 globally by US Gov over a jailbreak. This is exactly why we need local models. I just saw this statement regarding Anthropic being hit with an emergency export control directive from the US government. They were forced to pull the plug on Fable 5 and Mythos 5 for all customers globally. The tl;dr is that the government got spooked by a narrow jailbreak… 10 r/LocalLLaMA community 19d ago Local LLMs aren't democratic anymore... the hardware barrier has gotten out of hand. When we first started experimenting with local LLMs, it was a completely different story! We were using gaming GPUs to tinker around. 8GB or 16GB of VRAM (which wasn't even a given for everyone) was the norm, and so many people could actually get their hands dirty and… 25 arXiv — NLP / Computation & Language research 20d ago sebis at CRF Filling 2026: A Two-Stage Local LLM Pipeline for Medical CRF Filling arXiv:2606.13082v1 Announce Type: new Abstract: The extraction of structured clinical information from unstructured EHR notes is a persistent bottleneck in healthcare informatics. While large language models (LLMs) offer high performance, their deployment in clinical settings is… 12 arXiv — NLP / Computation & Language research 20d ago TimeLens: On-Device Artifact Recognition with Retrieval-Augmented Question Answering for the Grand Egyptian Museum arXiv:2606.13267v1 Announce Type: cross Abstract: TimeLens is an AI-powered bilingual mobile guide for the Grand Egyptian Museum (GEM). Pointing a phone at an exhibit, a visitor sees the artifact recognized in real time and can ask follow-up questions answered in English or… 37 r/LocalLLaMA community 20d ago xdna-top: unified NPU+iGPU terminal monitor for Strix Halo (Ryzen AI Max) — finally see the NPU work If you're running local models on a Ryzen AI Max / Strix Halo box, you've probably noticed it's hard to see what the NPU is actuallydoing. amd-smi is still broken on gfx1151 (ROCm #6035 ( https://github.com/ROCm/ROCm/issues/6035 )), and while GNOME Resources has a GUI view, I… 21 arXiv — NLP / Computation & Language research 21d ago Energy-Efficient On-Device RAG on a Mobile NPU: System Design and Benchmark on Snapdragon X Elite arXiv:2606.11257v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) pipelines are compute-intensive, combining embedding, retrieval, reranking, and large language model (LLM) generation. Running them entirely on-device benefits privacy, latency, and offline use,… 35 r/LocalLLaMA community 21d ago I wired a fully offline voice loop to Ollama + LM Studio — 100% CPU, no GPU, nothing leaves your machine (Silero VAD + Parakeet STT + Supertonic TTS 3) I kept wanting to talk to my local models instead of typing, but every voice setup wanted a GPU, shipped my audio to the cloud, or was macOS-only. So I built one that's none of those — and I benchmarked it, so these are real measured numbers, not vibes. One command installs the… 12 r/LocalLLaMA community 21d ago Tried to benchmark Google’s new on-device dictation models (Eloquent) and basically couldn’t I tried to benchmark Google’s new on-device dictation app (Eloquent) and basically couldn’t. It drops about half of my dictations. tl;dr Full results are 👉 here . Background: Google shipped a new fully‑local dictation app yesterday with proprietary new models , so I was excited… 5 r/LocalLLaMA community 21d ago Local LLM good for OCR of handwriting? I am using qwen3-vl:8b and ollama for doing OCR on scans of handwritten letters and it is doing a decent job. Any other models I should know about for this kind of OCR?   submitted by   /u/SensitiveCranberry00 [link]   [comments] 15 r/LocalLLaMA community 22d ago Local LLms releases Here are some graphs for the Local LLMs releases, it's strange except for the last month, i thought that this year was very heavy in terms of release, but is seems that the peak was last year. Maybe the hype about the quality improvement this year made it seems that it was… 4 r/LocalLLaMA community 22d ago Can you really replace paid models with a local model? Long time lurker, and I say this as someone who genuinely loves this community and runs many local models myself. I’ve been using LLMs since the early GPT and LLaMA days. Obviously, models have come a unbelievably long way. Local/open models today are dramatically better than… 13 r/LocalLLaMA community 22d ago Anthropic is intentionally nerfing Fable when asked to develop other LLMs Reason 458 why local LLMs are going to be a necessity   submitted by   /u/onil_gova [link]   [comments] 16 arXiv — Machine Learning research 22d ago Operator Fusion for LLM Inference on the Tensix Architecture arXiv:2606.09879v1 Announce Type: new Abstract: This study addresses on-device inference bottlenecks of Transformer models on Tenstorrent's Tensix architecture and proposes an operator fusion strategy that enhances data locality. RMSNorm is fused with matrix multiplication in… 35 r/LocalLLaMA community 22d ago Fine-tuned Qwen2.5-7B to 96% of Claude Haiku on a domain-specific task using ~$3 of API calls and zero human labelers Built a decision-reasoning engine (Orlog) and wanted to fine-tune a local model for it instead of paying per-call forever. The method (DV-DPO): Run a 3-voice council on each question, produce a synthesis Cross-examine: losing voices challenge the synthesis If synthesis gets… 35 r/LocalLLaMA community 22d ago Furiosa AI selling inference chip to consumer market will be a game changer to local llm ​ This is south Korean start up all-in on inference chip: https://furiosa.ai/renegade-spec Tsmc 5nm node Hynix HBM3 1.5TB/s 48GB VRAM TDP 180W Already tested on LG LLM. If they opened their programming interface the way NVIDIA opens PTX and Intel opens SPIR-V, and team up… 12 r/LocalLLaMA community 22d ago Apple announced new on device inference engine for Apple Silicon This news seem to have flown under the radar. Apple announced CoreAI on WWDC which is basically a future replacement for CoreML and an alternative to MLX/llama.cpp/torch for on-device optimized inference, especially on phones and tablets. The model weights need to be converted… 25 r/MachineLearning community 22d ago Are privacy-preserving techniques actually being used in production ML systems? [D] I've been reading more about privacy-preserving ML approaches such as differential privacy, federated learning, and on-device inference. The research literature is fairly active, but I'm curious about real-world adoption. For those working in industry: Are these techniques being… 16 r/LocalLLaMA community 23d ago Still a VERY lightweight open web-search tool for smaller local LLMs - now with SearXNG support Hey everyone, TinySearch v0.2.0 (first stable beta) is out. The first version used DuckDuckGo directly, which worked well enough to prove the idea, but yeah.. relying on one search source was way too fragile lol. DDG started throwing limits/CAPTCHAs more often in the last 2… 25 r/LocalLLaMA community 23d ago Gemma 4 31B's competence surprised me I'm just getting started using local LLMs for code. I'm not interested vibe coding, but I am hoping to increase my productivity in the publish or perish world of academia. My existing code from past projects is a mess and LLMs often fail to understand my code because I work with… 38 arXiv — Machine Learning research 23d ago HASA: Subnet Allocation for Compute-Constrained Model-Heterogeneous Federated Learning arXiv:2606.07621v1 Announce Type: new Abstract: Edge services increasingly use federated learning to personalize on-device models while keeping sensitive data local. In practice, deployments must handle heterogeneity in both client resources and local data distributions.… 24 Page 2 of 5 · 208 articles ← Newer Older →