News / #edge Tag Edge 208 articles archived under #edge · RSS Sign in to follow r/LocalLLaMA community 23d ago LocalLLaMA post tier list Since there is much (justified) whining about post quality, I thought it would be helpful to get a sense of what people actually DO like. Here's my take: S-tier: -GGUFs/MLX or benchmark data for new best-in-class local model released - New Optimizations that are actually a big… 17 r/LocalLLaMA community 23d ago Friends from the localllama community, if you love local llm, don't participate in the IPO (spaceX, OpenAI, Anthropic) I'm not going to. And you shouldn't either. The frontier labs are the ones who are harming our community. They are jacking the hardware prices up. First it was nvidia GPUs. And then it was RAM. And then SSD. And now HDDs prices are x3 compared to last year. Even NAS prices are… 35 r/LocalLLaMA community 23d ago I bundled a fully local LLM inside my Unity game. No internet, no cloud, no API key. The conversation is the gameplay. I am making a game that is bundled with a local LLM and every conversation is unique. The game, 'Simulation Simulator', is a campfire chat sim game about DMT, simulation theory, and a friend with a computer monitor for a head. 5 endings you can reach totally based on how you… 27 r/LocalLLaMA community 24d ago Galaxy Z Fold6 as a local inference node — llama.cpp/Vulkan, homelab telemetry, SHA-256 model verification Built a small Android app called Pocket Node that runs llama.cpp inference on-device. Here's what it actually does and what it doesn't. **What it does** * Loads a GGUF model (SmolLM3 Q4_0, ~1.1B params) directly on the Fold6 * Uses the Vulkan/OpenCL backend via llama.cpp — not… 12 r/LocalLLaMA community 24d ago Qwen3.6 35B-A3B on a Laptop: My Zero to One Moment Hi everyone, I'm new here - because I only have a laptop and I only just realized local models are actually good enough now. So I'd like to share my experience, in case it helps others, and also to learn from the more experienced people here. This is the first model that works… 29 r/LocalLLaMA community 25d ago Are local models good enough to replace Claude/Codex solely for simple HTML tasks? I know local models can’t compete fully yet, but I’m curious about where the limits are. My use case is generating simple HTML activities for elearning creation purposes. I know others are creating apps and more advanced software. Where are the limits for where local models can… 20 r/LocalLLaMA community 25d ago RTX 3090 EBay Pricing is Crazy!! Couple of years ago, before Local LLMs were in vogue, I bought 8 RTX 3090 @ $700 each to build a AI rig, it been working great and I was looking to build another to increase my capacity but looking at EBay those are now selling for 1,300 -1,500 range! That price seems totally… 17 r/LocalLLaMA community 25d ago Best Coding Harness for Qwen3.6 35B? I've been happily using GitHub Copilot for 7-8 months, primarily in Visual Studio and VS Code, mostly with the built-in flagship models and have felt like the output is worth the cost. Lately I've been playing with a lot of different local LLM models and decided to try using… 32 r/LocalLLaMA community 25d ago Experimentation with Qwen 3.6 and Gemma 4 - Guidance needed I’m a web developer doing mostly coding, but also project management, requirements analysis, testing, etc. I recently started experimenting with local LLMs, mostly because agentic stuff finally made them feel useful. Note: This text was fed to chartgpt to fix my messy repeating… 32 r/LocalLLaMA community 26d ago AA comparison of the latest local models I picked models I consider local (usable on 3×3090), so there are no 300B models, and you should probably skip 200B models too (but MiniMax and Step are pretty fast in Q3) Gemma-4 12B is still missing   submitted by   /u/jacek2023 [link]   [comments] 15 r/LocalLLaMA community 26d ago OpenLumara - A different kind of AI agent, written from scratch, not vibecoded. Extremely token-efficient, super small system prompt, made for local models. Everything is modular. Hi locallama community! Yes, I know, yet another AI agent announcement post. There are a dime a dozen out there... most of them though, are vibecoded, often very sloppy, and eat through context like no tomorrow. This is different. This runs beautifully and very fast with local… 9 r/MachineLearning community 27d ago Are We Underestimating Small Edge AI Models?[D] A lot of recent discussion around Edge AI focuses on running increasingly larger local LLMs. Meanwhile modern smartphones already have enough compute for many practical computer vision tasks that don't require massive models at all. I recently built and released an Android… 7 r/LocalLLaMA community 27d ago Run (your largest) local models from your iPhone   submitted by   /u/BustyMeow [link]   [comments] 18 llama.cpp releases dev-tools 27d ago b9518 server : disable on-device spec checkpoints ( #24108 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64… 15 r/LocalLLaMA community 27d ago I accidentally crippled my 4x RTX 3090 LLM rig with a hidden PCIe 2.0 x4 slot and fixing it doubled Mistral 128B performance I’m posting this as a warning for anyone building multi-GPU local LLM rigs with older workstation/HEDT boards. My setup (Node #04) Gigabyte X399 Designare EX Threadripper 1950X 128GB DDR4 4x RTX 3090 10GbE TP-Link/Aquantia NIC llama.cpp NCCL build vLLM for safetensors models I… 15 The Information — AI news-outlet 28d ago Apple to Launch New Siri in September With Help of Google, Nvidia Apple is currently on track to launch its overhauled Siri in September, to run in part on Google’s cloud computing servers using Nvidia chips, according to people familiar with the matter. While Apple will try to run as much as possible of the new Siri on devices such as… 31 r/LocalLLaMA community 28d ago Big Model Value Wars - DeepSeek V4 Pro vs MiMo-V2.5-Pro vs MiniMax M3 For those who sometimes boost their local model use with openrouter options, or the madlads who have the infrastructure to actually run those locally, it feels like those three model have the edge in best bang for your buck. How then do you decide which one to use? Do you have a… 19 r/LocalLLaMA community 28d ago Best way to index full Italian Wikipedia for 100% offline RAG in LM Studio? Hi everyone, I want to set up a 100% offline RAG system using LM Studio and the entire Italian Wikipedia (text-only, no images). My goal is to index the database once so my local LLMs can query it for up-to-date factual knowledge without internet access. Here are my PC specs:… 14 r/LocalLLaMA community 29d ago Microsoft Aion 1.0 Instruct and Aion 1.0 Plan models! Microsoft announced 2 new on-device models at Microsoft Build 2026. Aion 1.0 Instruct: efficiency at scale. Aion 1.0 Instruct is our next-generation small language model, smaller, faster and more efficient than our current Windows OS SLM. Designed from the ground up for… 14 r/LocalLLaMA community 29d ago I Put a Datacenter GPU in My Gaming PC for £200 Hey there! I wrote a blogpost about my experience running local models on a V100 from a newbie perspective and got loads of views outside of reddit, so I thought I'd share it here too!   submitted by   /u/tymscar [link]   [comments] 33 r/LocalLLaMA community 29d ago What are you using to preprocess pdfs before feeding them to a local model? I have been running a local setup for document QA and the output quality varies a lot depending on what the pdf looks like when it hits the LLM. clean prose docs are fine but anything with tables or multi column layouts comes out garbled and the model just works with whatever… 37 r/LocalLLaMA community 1mo ago Ignoring benchmarks, how do the newest local models (gemma 4 31B, 26BA4B, Qwen 3.6) “feel” to you? What do you think they compare to? I use local ai mainly for creative writing, and benchmarks are a bit iffy on that I feel like. I’d like to compare Gemma mainly to Gemini as I like their writing the best, I do know that qwen 3.6 is amazing but mostly for coding and agentic work. I’d like to ask everyone how the… 30 r/LocalLLaMA community 1mo ago Replaced Claude with local Qwen3.6-27B in my multi-agent orchestrator for 2 weeks For two weeks I ran my multi-agent orchestrator entirely on Qwen3.6-27B via Ollama, on a single 3090. The goal: see if a local model could replace Claude as the reasoning layer for the lead/manager/sub-agent loop. Here's where it worked and where it broke. Setup: - RTX 3090,… 13 r/LocalLLaMA community 1mo ago Man trains local model to detect and kill mosquitos with a laser Now this is local AI innovation we can all get behind. https://x.com/stevencheng/status/2059836738449854898   submitted by   /u/No_Information9314 [link]   [comments] 37 r/LocalLLaMA community 1mo ago Stop asking what model to run. There are literally only two. Can we please ban the daily "I have an RTX 3060, what should I run?" slop threads? It’s not complicated. As of right now, Hugging Face is empty and exactly two local models exist on this entire planet: Qwen 3.6 35b a3b Qwen 3.6 27b That is the entire list. Your specs don’t… 30 arXiv — NLP / Computation & Language research 1mo ago The Architecture of Errors: From Universal Impossibility to Patch-Local LLM Reliability arXiv:2605.30628v1 Announce Type: new Abstract: Universal LLM reliability is not a finite-library problem: across all possible tasks, tools, schemas, knowledge sources, and evaluator expectations, new intervention-distinguishable failure modes can appear without bound, so no… 38 arXiv — NLP / Computation & Language research 1mo ago Translation Analytics for Freelancers II: Benchmarking Local LLMs for Confidential Translation Workflows arXiv:2605.31452v1 Announce Type: new Abstract: Building on our previous work, this paper develops practical, low-barrier methods for freelance translators and smaller language service providers to evaluate translation technologies using rigorous yet accessible analytic methods.… 23 Hugging Face Daily Papers research 1mo ago From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors Abstract Multi-step trojan attacks in local LLM agents can bypass existing defenses by embedding malicious prompts across multiple operations, requiring new detection methods like DASGuard for effective protection. AI-generated summary LLM agents are evolving from conversational… 20 r/LocalLLaMA community 1mo ago Don’t bite me for that question please… And question is… How you earning money on your local llm setups? (Except coding ofc) I see people spending SO MUCH MONEY on the compute power to run llms locally and many of them saying that their setups already payed themselves or they earning much more (I guess they not mean… 29 r/MachineLearning community 1mo ago I built mlx-Chronos — a community benchmark leaderboard for local LLM engines on Apple Silicon (oMLX, Rapid-MLX, mlx-lm, Ollama) [P] Hey! I'm a CS student and I got tired of not being able to compare MLX inference engines properly — every benchmark out there is either made by the engine's own developers, runs on an M3 Ultra nobody has, or just shows tok/s with zero context. So I built mlx-Chronos — a small… 11 r/LocalLLaMA community 1mo ago what do you use your local llm? what do you use your local llm for? for me, i run everything on linux and it ends up generating api tokens i can plug into other stuff. on my laptop (and for personal projects), i mostly use it for coding help—then i’ve got an ai agent (not openclaw ) that monitors stock prices… 34 r/LocalLLaMA community 1mo ago How do I try to run Gemma 4 31B at Q8 quantization? Only seeing Q4_K_M on Ollama Just got my new PC up and running and want to test some local models. I'm a complete noob but I've managed to install ollama. Im on Fedora Linux.   submitted by   /u/JayoTree [link]   [comments] 9 r/LocalLLaMA community 1mo ago Cost Analysis of my $6.4k Local LLM Server I haven't seen any of these done, so I just wanted to share my experience in case it is useful for anyone. The purpose of this post is to show total cost of ownership of my local llm server versus API equivalent. Before you look at the final numbers, note that most people do not… 17 r/LocalLLaMA community 1mo ago Why does Thinking Output More Tokens Than a Response? I was too lazy to use a vector DB + Embedding + Clustering for this list of 1000 items I wanted to categorize. I was hoping to use a local LLM to do it, but it would only respond with a list of about 100 items or so and their categories. It confused me because when I saw the… 22 r/LocalLLaMA community 1mo ago Shoutout to Gemma4 as a conversational assistant / agent I'm seriously impressed by Gemma4 26B A4B. On my M5 Pro (so not much memory bandwidth by GPU standards), it's blazingly fast and it's a very good generalist / everyday local LLM. It has a little bit of personality to its responses, and seems to perform decently for everything:… 37 r/LocalLLaMA community 1mo ago LiquidAI/LFM2.5-8B-A1B · Hugging Face looks like you can run it on any potato (A1B)! https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF from LiquidAI: LFM2.5 is a new family of hybrid models designed for on-device deployment. It builds on the LFM2 architecture with extended pre-training and reinforcement learning.… 22 r/LocalLLaMA community 1mo ago Qwen3.6 35B - TXT vs Markdown vs HTML vs HTML+CSS Theres been talk of late about using HTML rather than markdown in Claude Code. I was curious how this worked with a local model so loaded up Qwen3.6 35B A3B at Q8 and F16 KV cache. Then I gave it the same prompt write a detailed explanation of the Blazor render cycle first… 31 The Information — AI news-outlet 1mo ago Apple to Renew Push for AI That Runs on Devices, Instead of the Cloud At Apple’s annual developer conference next month, the star of the show will be a series of long-delayed artificial intelligence upgrades to the iPhone. But the company is also expected to emphasize what could be an underrated asset in its efforts to catch up in AI: Its ability… 26 r/LocalLLaMA community 1mo ago Heterogeneous GPU Weighting & Layer Splitting This is what I worked on today. With local LLM of course. So if I didn't write the code, did I really work on it? Who cares. It was my idea and I simply asked it to implement it. I basically downloaded /main/ branch, which is totally broken for Windows by the way (i had to… 21 arXiv — Machine Learning research 1mo ago The Energy Blind Spot: NVIDIA's Flagship Edge AI Hardware Cannot Support Process-Level Energy Attribution arXiv:2605.27599v1 Announce Type: new Abstract: Agentic AI workloads - where a single user goal triggers multi-step orchestration, tool calls, retries, and failure recovery - are being targeted for edge deployment, with NVIDIA, Dell, HP, ASUS, MSI, Acer, and Gigabyte all… 37 r/LocalLLaMA community 1mo ago Local LLMs on Refurb M4 Max vs new M5 Max Hoping the community can guide me on this one. I'm on the fence about the following purchase: Refurbished 16-inch MacBook Pro Apple M4 Max Chip with 16‑Core CPU and 40‑Core GPU, 64gb ram, 1Tb Drv for $3,479.00 vs The new 16-inch MacBook Pro Apple M5 Max Chip with 18‑core CPU,… 30 r/LocalLLaMA community 1mo ago CrankGPT by Squeez Labs - hand-cranked edge AI - talk about local AI!!! I met Katrin from Squeez Labs at an event hosted by Pathway AI (the team behind Baby Dragon Hatchling) where she told me about CrankGPT, a literally hand-cranked device for running local LLMs. It's apparently real. It's appearently launched. It's apparently glorious. Check it… 15 r/LocalLLaMA community 1mo ago Qwen3.6 huge quality gain from Q4 to Q6 for coding agent So, last week I tried to update my unused local LLM setup. I had to stop using it because quality was too low and deepseek was too cheap. First thing I stopped using Ollama and now I only use llama.cpp built in server that works really great. The quality improvement from Q4 to… 34 arXiv — Machine Learning research 1mo ago The Constraint Tax: Measuring Validity-Correctness Tradeoffs in Structured Outputs for Small Language Models arXiv:2605.26128v1 Announce Type: new Abstract: Production LLM systems increasingly require machine-readable outputs: JSON objects, typed traces, regex-constrained fields, and tool-call schemas. This paper targets on-device and low-cost small language model (SLM) deployments,… 24 arXiv — Machine Learning research 1mo ago Dense2MoE: Pushing the Pareto Frontier of On-Device LLMs via Unified Pruning and Upcycling arXiv:2605.26496v1 Announce Type: new Abstract: The Mixture of Experts MoE architecture is highly promising for resource constrained on device deployments yet training these models from scratch incurs prohibitive costs Current methods attempt to alleviate this by upcycling dense… 32 Hugging Face Daily Papers research 1mo ago MobileMoE: Scaling On-Device Mixture of Experts Abstract MobileMoE introduces efficient on-device Mixture-of-Experts language models with sub-billion parameters that achieve better performance and efficiency compared to dense baselines and existing MoE models. AI-generated summary Mixture-of-Experts (MoE) has become the de… 17 arXiv — Machine Learning research 1mo ago Signs Beat Floats: Low-Rank Double-Binary Adaptation for On-Device Fine-Tuning arXiv:2605.24058v1 Announce Type: new Abstract: On-device adaptation of large language models commonly keeps a quantized base model frozen while training and deploying a small, task-specific LoRA adapter. In the unmerged adapter-mode setting, however, the adapter is more than a… 28 r/LocalLLaMA community 1mo ago New local model reaching near frontier on PII removal at 9 ms CPU inference Hi all, I've been working on this model to strip sensitive information from computer use data and would love some feedback!   submitted by   /u/louis3195 [link]   [comments] 34 r/LocalLLaMA community 1mo ago Using Local LLMs for Generating Custom Interactive Recursive Textbooks on the Fly   submitted by   /u/Ryoiki-Tokuiten [link]   [comments] 28 llama.cpp releases dev-tools 1mo ago b9315 llama : document that only one on-device state can be saved per sequence ( #23520 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64… 13 Page 3 of 5 · 208 articles ← Newer Older →