News / #edge Tag Edge 208 articles archived under #edge · RSS Sign in to follow r/LocalLLaMA community 52m ago Palantir CEO rages against closed models For context, this week they struck a deal to buy Nvidia chips and run local models for their enterprise clients. So in this video he is railing against Anthropic and OpenAI saying they are ripping everyone off while stealing their data too. Always a special moment when the enemy… 30 r/LocalLLaMA community 10h ago My reasons to run local models I can finetune any model on any dataset I want. I can use techniques like speculative decoding and other sota approaches to get the max tps The llm provides like anthropic and openai are not getting access to my data The hardware is reusable for vision text speech, and I can run… 10 r/LocalLLaMA community 13h ago Open benchmark: how well can multimodal LLMs read a calendar week-view from a screenshot? Humans ~99%, Q4 local models..... Some backstory I've been working on my local agent (openclaw), and I wanted to give it the skill to reconstruct calendar entries from a photo of the screen. I couldn't get at the calendar through an API (long story), so a photo was the only low-friction way to export the data.… 16 r/LocalLLaMA community 17h ago I mapped which local LLMs actually fit each RAM tier, 8 to 128GB (open dataset) I kept answering the same question for friends ("I've got a 16GB MacBook / a 3060, what can I actually run?") and got tired of guessing, so I started a spreadsheet. It grew into a real dataset, so I put it on GitHub under CC BY for anyone to use or fix. Rule of thumb I landed… 28 r/LocalLLaMA community 21h ago LokalBot - fully local macOS app: meetings, autocomplete, and day tracking that all run on your machine with a user friendly UI Been lurking here a while, this sub is basically why LokalBot exists. It's a Mac app that records + summarizes your meetings, autocompletes your typing in any app, and tracks where your day went, with every model running on-device . No cloud, no account, no API keys. Most of the… 15 r/LocalLLaMA community 23h ago I built a desktop AI that scrubs your PII locally before it hits the cloud — here's every feature with real screenshots Been building this for a few months. It's called Primnox. The core thing: before ANY message leaves your machine, a local DeBERTa NER model runs on-device, finds names/emails/addresses/phone numbers, swaps them for stable placeholders (FIRSTNAME, EMAIL etc), sends the tokens to… 37 r/LocalLLaMA community 1d ago Ketch - Best Search Tool for local models recently I wrote a blog post, to find which search tool will be best for the pi coding agent paired with local models (currently I use Qwen3.6 35B) Before that I were using firecrawl or brave-search, but found them very decent, so I went to SearXNG, which is fine, but lacks some… 38 Hugging Face Daily Papers research 1d ago Little Brains, Big Feats: Exploring Compact Language Models Abstract Small language models can effectively perform retrieval-augmented generation tasks directly on-device without GPU acceleration. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While large language models have been dominating the research landscape recently, small language… 13 r/LocalLLaMA community 1d ago I benchmarked full tool catalog vs ranked catalog on a local model: 8% → 77% accuracy Been running agents locally for a while and kept hitting the same issue: the more tools I added, the worse the model got at picking the right one.. So I finally benchmarked it properly.. Setup: qwen3.5-class model on an M4 MacBook, 100 tools in the catalog. One run with the full… 23 r/LocalLLaMA community 2d ago Tesla V100 16GB local LLMs, single and dual NVLink benchmarks Picked up a couple of Tesla V100-SXM2-16GB modules a while back to run local models and drive Claude Code fully offline, figured the actual numbers and the traps might save someone else the pain. They've come right down in price and the 16GB of HBM2 at ~900 GB/s still holds up… 33 arXiv — NLP / Computation & Language research 2d ago MAM-AI: An On-Device Medical Retrieval-Augmented Generation System for Nurses and Midwives in Zanzibar arXiv:2606.29580v1 Announce Type: new Abstract: Maternal and newborn mortality remain among the highest in sub-Saharan Africa, where midwifery care is often delivered by nurses who lack midwifery training to international standards, and consulting authoritative guidance at the… 7 r/LocalLLaMA community 2d ago How I'm using local models from real-world coding Just want to share since after many attempts over the past year, I finally have a setup I kinda like and does useful work for me. I only have 32GB of RAM and a 4070 8GB (laptop), just very ordinary hardware. I found that Qwen3.6-35B-A3B runs reliably at about 15 tokens per… 25 r/LocalLLaMA community 2d ago I Hate Dario Amodei, and everything he stands for. I am so incredibly sick of this guy‘s fear mongering about open source while fundamentally misunderstanding how it actually works. He recently dropped some arguments that are so completely detached from reality, it honestly feels like he’s never even touched a local model in his… 31 r/LocalLLaMA community 2d ago Anyone else end up building a web access layer for local AI agents? I've been running local models for most of my experiments, and I kept running into the same issue. The model lives locally, but everything it needs to interact with doesn't. Every new agent ended up with another GitHub client, another Reddit integration, another documentation… 10 r/LocalLLaMA community 2d ago NASA testing local LLM inference for future space missions Red Hat published a blog post last week about an initiative I supported with NASA researchers at Johnson Space Center building a medical AI assistant. It's called the Crew Medical Officer Digital Assistant (CMO-DA) and the system runs LLMs and other models on local hardware with… 34 r/LocalLLaMA community 3d ago I built an agent Harness for Small Models. I got Qwen 3.5 4b managing servers. This is something I've been working on, I like playing around with smaller local models but found most agent harness's not well suited for them. The failure modes across different model family's tend to be the same: Failed tool calls Poor varication of environment variables Poor… 12 r/LocalLLaMA community 3d ago NPC Engine Using Local Models I’ve been working on a game-agnostic NPC engine/backend based pretty heavily on SillyTavern-style architecture, and with smaller local models getting better and better, I honestly think this kind of thing could be the future of RPGs. Right now I’m using NVIDIA Parakeet 0.6 for… 22 r/LocalLLaMA community 3d ago Best case for dual RTX 3090 (250W each) on Crosshair VIII Hero? I'm building a local LLM workstation and would appreciate some advice from people already running 2×3090s. Current hardware: ASUS Crosshair VIII Hero (X570) One Gainward Phoenix RTX 3090 Looking for a second used 3090 (not necessarily the same model) Both GPUs will be… 9 r/LocalLLaMA community 4d ago I built a tool to turn your Claude Code sessions into fine-tuning data for local models If you use Claude Code, every session is already sitting on disk as a .jsonl file under ~/.claude/projects/ . It has real coding conversations: multi-turn edits, tool calls, reasoning traces. That's training data you already generated for free. The problem is the format is not… 36 r/LocalLLaMA community 4d ago Mythos was the first, now GPT-5.6 https://techcrunch.com/2026/06/26/openai-limits-gpt-5-6-rollout-after-government-request-says-restrictions-shouldnt-be-the-norm/ Either a hype before IPO, or they have just shot themselves in a foot. This is pretty much it for more advanced online models. Local LLM is one of the… 17 r/LocalLLaMA community 4d ago What’s the latest on agent browser use? What is the latest and greatest agent browser use framework? I remember trying browser use a few months back and it was ok but would fall apart after long workflows. Has there been improvements to agents controlling browsers and following a predefined workflow? Can local models… 32 r/LocalLLaMA community 4d ago Dear poor people of this subreddit I see people with multi-gpu setups but I'm sure there's a potato LLM runner out there somewhere. I have an old macbook pro (i5 8th gen, 8GB RAM) that I want to turn into a homelab. I want to run a small local model for experimenting and if possible, agentic tasks (like say… 22 r/LocalLLaMA community 5d ago Local LLM Peeps I am 80% done with a harness that works for local and API but is local first. The harness has some interesting logic around multiple agents which I’m holding back on until it is open source on GitHub. I have been local for 6 months and built out EVERYTHING I could think of to… 28 r/LocalLLaMA community 5d ago Streaming medical STT running locally on a MacBook Quick teaser of what I’ve been working on over the last few weeks: a streaming medical speech-to-text model that runs fully on-device. This demo is running locally on a MacBook through MLX. Still doing more evals, but planning to release the open weights next week.  … 22 r/LocalLLaMA community 5d ago Getting real work out of a 4B local model: the distill-on-idle pipeline behind an on-device "memory" assistant https://preview.redd.it/iiiqwt96tn9h1.png?width=3004&format=png&auto=webp&s=f02fba9f64e27ac91b2ae4cd478842106b294366 https://preview.redd.it/47cb5u96tn9h1.png?width=3024&format=png&auto=webp&s=b1cee93477970b8b0a636c37be657fecd38ba968… 7 r/LocalLLaMA community 5d ago What's one local AI workflow you wish you'd discovered sooner? There are a lot of posts about the models and benchmarks, but I am more interested in the workflows that people use. What is one workflow that really saved you time or made your local LLM more useful? It could be anything—RAG, MCP, coding agents, organizing prompt, document… 23 r/LocalLLaMA community 5d ago Help optimizing llama.cpp + Qwen 27B on RTX PRO 6000 Blackwell for coding agents Our company recently acquired a workstation with an RTX PRO 6000 Blackwell , and we're experimenting with local LLMs to reduce part of our Claude token usage. Right now we’re running Qwen3.6 27B MTP Q8_K_XL with llama.cpp on Windows 11 . I've been using both Claude Opus and… 13 arXiv — Machine Learning research 6d ago Dot-Flik: A Scalable Edge AI Architecture for Distributed Insect Monitoring arXiv:2606.26121v1 Announce Type: cross Abstract: Global insect population declines necessitate scalable, continuous monitoring systems, yet existing vision-based solutions remain constrained by high hardware costs, energy demands, and reliance on centralized processing or cloud… 11 arXiv — NLP / Computation & Language research 6d ago AnySimLite: A Lightweight Few-Shot Similarity Encoder for On-Device Speech-Adjacent Classification arXiv:2606.26452v1 Announce Type: new Abstract: To minimize privacy concerns and inference latency on edge devices like smartphones, lightweight on-device models remain important for end-user applications. Many of these applications involve natural language classification, but… 31 arXiv — NLP / Computation & Language research 6d ago Cascaded Multi-Granularity Pruning for On-Device LLM Inference in Industrial IoT arXiv:2606.26861v1 Announce Type: new Abstract: Deploying large language models (LLMs) on Industrial Internet of Things (IIoT) edge devices demands extreme compression, yet existing structured pruning methods collapse at high compression ratios due to one-shot importance… 27 r/LocalLLaMA community 6d ago Good YouTube channels for local LLM news and development? Sometimes I'd prefer chilling on the couch and learning instead of reading. I've searched on YouTube and most seem like clickbait and slop. Thanks   submitted by   /u/6jarjar6 [link]   [comments] 5 r/LocalLLaMA community 6d ago Built an open source local first Kanban workflow for running AI coding agents without babysitting every step I’ve been building BatonBot, a local first app for running AI coding workflows with less babysitting. The problem I kept running into, especially with local models, is that coding agents can be useful but the workflow gets slow: start task → wait → check output → fix next issue… 10 r/LocalLLaMA community 6d ago Prices of graphic cards are going crazy, should I buy a second card though? A few months ago, I bought a RX 7900 XTX 24g to start toying with local LLM, at 900€ new. Little I knew that now I want to add a second card to my rig, but prices have gone insane! Adding a new 7900 XTX would cost me 1200€ as new now, used price is around 900€ now, and the last… 38 r/LocalLLaMA community 6d ago Fast medical RAG API to give your local LLMs access to facts I created a simple RAG API using medical Wikipedia articles that you can point your agent to and use freely. It may be useful in allowing your local LLMs access to medical facts they might not be able to recall from their weights. I'm aiming for subsecond responses but cannot… 7 r/LocalLLaMA community 6d ago It turns out Bash is All You Need to write a language model REPL (and jq and curl) While working on an self-educational exercise tinkering with local models and trying my hand at setting up agents, I went down a rabbit hole: to see how far I could build a custom agent REPL loop using exclusively command-line building blocks and stripping out dependencies… 20 r/LocalLLaMA community 7d ago Has anyone tried to hack into their own system using a local model? With all this talk about Mythos being able to hack into. US government systems, I was wondering if anyone has tried to get root on their own system using a local model?   submitted by   /u/MrMrsPotts [link]   [comments] 18 arXiv — Machine Learning research 7d ago On-Device Neural Architecture Search arXiv:2606.24900v1 Announce Type: new Abstract: This paper proposes a new approach to near-sensor computing, in which a lightweight Neural Architecture Search (NAS) is performed directly on the deployment device to find the best tiny neural architecture for analyzing the… 26 arXiv — Machine Learning research 7d ago Forget to Improve: On-Device LLM-Agent Continual Learning via Budget-Curated Memory arXiv:2606.25115v1 Announce Type: new Abstract: On-device language-model agents improve by accumulating experience in retrieved memory rather than by updating weights. This memory is hard-bounded and exposed: it consumes RAM and energy, reaches peers through a thin uplink, and… 24 r/LocalLLaMA community 7d ago I did some model hacks, and got GLM5.2 from about 2.5 tok/s to >50 tok/s on my GH200 system. G'day. This is part 3 on my Local LLM adventures. I have a crazy system hacked server-to-desktop system : Component Spec GPUs 2x Hopper H100, 96 GB HBM3 each CPUs 2x Grace, 72 cores each Host memory 480 GB LPDDR5X per Grace, 960 GB total So I can run technically run GLM5.2.… 34 arXiv — Machine Learning research 8d ago Lightweight Transformer Models for On-Device Fault Detection: A Benchmark Study on Resource-Constrained Deployment arXiv:2606.24173v1 Announce Type: new Abstract: On-device fault detection enables real-time diagnostics without cloud dependency, but deploying machine learning models on resource-constrained hardware demands careful tradeoffs between accuracy, latency, and model size. We… 14 arXiv — Machine Learning research 8d ago EnerInfer: Energy-Aware On-Device LLM Inference arXiv:2606.23001v1 Announce Type: cross Abstract: On-device LLM inference is increasingly attractive for privacy-preserving, reliable, and cost-effective deployment, yet its energy and thermal costs remain a critical bottleneck. Existing systems primarily optimize for decoding… 13 r/LocalLLaMA community 8d ago 650+ Apache-2.0 biomedical NER/de-id models that run on-device in MLX. Same fp32 weights, identical outputs: the clinical NER models run 30-40x faster than PyTorch-CPU on a 3-year-old M3 Max. Repro inside. Disclosure first: I maintain OpenMed, so read this with that bias. I'm posting the numbers with the full methodology and a runnable script so you can reproduce or tear it apart. I'm here for the next couple of hours to answer methodology questions. What it is: an open-source… 25 r/LocalLLaMA community 8d ago My local server idling 99% of the time! Guys what you running to make agents busy? Like some crazy 24/7 tasks, or maybe some useful ideas on how to utilize local llm with some purpose/use? I personally running Qwen3.6-27B with owu and with pi for coding (little-coder) but as in title - it’s idling all the time…  … 33 r/LocalLLaMA community 9d ago been tracking EU DDR5 data for 25 days: Prices are dropping, and the DE vs. NL gap is wild (good news for local LLM builders in EU) hey again! been tracking DDR5 prices across 4 EU countries (DE, NL, ES, BE) for the past month. some findings relevant to local LLM builders: prices are falling: G.Skill DDR5 Aegis 2x16GB 6000: -28% in 25 days (€579 → €419) Kingston FURY Beast RGB 2x16GB 6000: -26% (€499 → €369)… 37 r/LocalLLaMA community 10d ago Do you think dedicated hardware for running local LLMs will become affordable anytime soon? Models like qwen 27b dense have already proved to be useful coding/general purpose assistants, but issue is still with hardware even the entry level hardware is relatively expensive, would we be getting hardware specifically built for inference for consumers at affordable price… 6 r/LocalLLaMA community 10d ago For programmers with slow local LLM setup, what's your workflow? What's your workflow and what's the best way you have found to code with local LLM when your token generation is < 10 tk/sec?   submitted by   /u/segmond [link]   [comments] 14 Hugging Face official-blog 10d ago We got local models to triage the OpenClaw repo for FREE!* Back to Articles a]:hidden"> We got local models to triage the OpenClaw repo for FREE!* Published June 22, 2026 Update on GitHub Upvote - Onur Solmaz osolmaz ben burtenshaw burtenshaw shaun smith evalstate Pedro Cuenca pcuenq Lysandre lysandre *Free as in beer, excluding the… 30 r/LocalLLaMA community 10d ago Local LLM Inference Optimization: The Complete Guide I compiled a year of local LLM experiments into a practical llama.cpp optimization guide, covering VRAM fitting, KV cache, MoE placement, MTP, CPU tuning, and common OOM traps. Pass this to an LLM of your choice and get on the local model train.… 4 r/LocalLLaMA community 10d ago Local text to image model comparaison: The ultimate test. I selected 192 prompts to evaluate text-to-image model various capabilities and generated images for all the local models I was able to make work on my GX10 Spark. For instance: Is the model good at text? At faces? At human anatomy? At respecting spatial composition, etc...? You… 4 r/LocalLLaMA community 10d ago Best local model for vision - 2nd benchmark update - 21 Jun 2026 I previously posted the first results of my VLM benchmark . There were a few useful comments and observations I took into account, to revise and expand my benchmark: I initially did not take into account the Gemma 4 vision budget which defaults to 280, essentially making it… 9 Page 1 of 5 · 208 articles Older →