Tag

Edge

208 articles archived under #edge · RSS

r/LocalLLaMA community 23d ago

LocalLLaMA post tier list

Since there is much (justified) whining about post quality, I thought it would be helpful to get a sense of what people actually DO like. Here's my take: S-tier: -GGUFs/MLX or benchmark data for new best-in-class local model released - New Optimizations that are actually a big…

17
r/LocalLLaMA community 23d ago

Friends from the localllama community, if you love local llm, don't participate in the IPO (spaceX, OpenAI, Anthropic)

I'm not going to. And you shouldn't either. The frontier labs are the ones who are harming our community. They are jacking the hardware prices up. First it was nvidia GPUs. And then it was RAM. And then SSD. And now HDDs prices are x3 compared to last year. Even NAS prices are…

35
r/LocalLLaMA community 23d ago

I bundled a fully local LLM inside my Unity game. No internet, no cloud, no API key. The conversation is the gameplay.

I am making a game that is bundled with a local LLM and every conversation is unique. The game, 'Simulation Simulator', is a campfire chat sim game about DMT, simulation theory, and a friend with a computer monitor for a head. 5 endings you can reach totally based on how you…

27
r/LocalLLaMA community 24d ago

Galaxy Z Fold6 as a local inference node — llama.cpp/Vulkan, homelab telemetry, SHA-256 model verification

Built a small Android app called Pocket Node that runs llama.cpp inference on-device. Here's what it actually does and what it doesn't. **What it does** * Loads a GGUF model (SmolLM3 Q4_0, ~1.1B params) directly on the Fold6 * Uses the Vulkan/OpenCL backend via llama.cpp — not…

12
r/LocalLLaMA community 24d ago

Qwen3.6 35B-A3B on a Laptop: My Zero to One Moment

Hi everyone, I'm new here - because I only have a laptop and I only just realized local models are actually good enough now. So I'd like to share my experience, in case it helps others, and also to learn from the more experienced people here. This is the first model that works…

29
r/LocalLLaMA community 25d ago

Are local models good enough to replace Claude/Codex solely for simple HTML tasks?

I know local models can’t compete fully yet, but I’m curious about where the limits are. My use case is generating simple HTML activities for elearning creation purposes. I know others are creating apps and more advanced software. Where are the limits for where local models can…

20
r/LocalLLaMA community 25d ago

RTX 3090 EBay Pricing is Crazy!!

Couple of years ago, before Local LLMs were in vogue, I bought 8 RTX 3090 @ $700 each to build a AI rig, it been working great and I was looking to build another to increase my capacity but looking at EBay those are now selling for 1,300 -1,500 range! That price seems totally…

17
r/LocalLLaMA community 25d ago

Best Coding Harness for Qwen3.6 35B?

I've been happily using GitHub Copilot for 7-8 months, primarily in Visual Studio and VS Code, mostly with the built-in flagship models and have felt like the output is worth the cost. Lately I've been playing with a lot of different local LLM models and decided to try using…

32
r/LocalLLaMA community 25d ago

Experimentation with Qwen 3.6 and Gemma 4 - Guidance needed

I’m a web developer doing mostly coding, but also project management, requirements analysis, testing, etc. I recently started experimenting with local LLMs, mostly because agentic stuff finally made them feel useful. Note: This text was fed to chartgpt to fix my messy repeating…

32
r/LocalLLaMA community 26d ago

AA comparison of the latest local models

I picked models I consider local (usable on 3×3090), so there are no 300B models, and you should probably skip 200B models too (but MiniMax and Step are pretty fast in Q3) Gemma-4 12B is still missing   submitted by   /u/jacek2023 [link]   [comments]

15
r/LocalLLaMA community 26d ago

OpenLumara - A different kind of AI agent, written from scratch, not vibecoded. Extremely token-efficient, super small system prompt, made for local models. Everything is modular.

Hi locallama community! Yes, I know, yet another AI agent announcement post. There are a dime a dozen out there... most of them though, are vibecoded, often very sloppy, and eat through context like no tomorrow. This is different. This runs beautifully and very fast with local…

9
r/MachineLearning community 27d ago

Are We Underestimating Small Edge AI Models?[D]

A lot of recent discussion around Edge AI focuses on running increasingly larger local LLMs. Meanwhile modern smartphones already have enough compute for many practical computer vision tasks that don't require massive models at all. I recently built and released an Android…

7
r/LocalLLaMA community 27d ago

Run (your largest) local models from your iPhone

  submitted by   /u/BustyMeow [link]   [comments]

18
llama.cpp releases dev-tools 27d ago

b9518

server : disable on-device spec checkpoints ( #24108 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

15
r/LocalLLaMA community 27d ago

I accidentally crippled my 4x RTX 3090 LLM rig with a hidden PCIe 2.0 x4 slot and fixing it doubled Mistral 128B performance

I’m posting this as a warning for anyone building multi-GPU local LLM rigs with older workstation/HEDT boards. My setup (Node #04) Gigabyte X399 Designare EX Threadripper 1950X 128GB DDR4 4x RTX 3090 10GbE TP-Link/Aquantia NIC llama.cpp NCCL build vLLM for safetensors models I…

15
The Information — AI news-outlet 28d ago

Apple to Launch New Siri in September With Help of Google, Nvidia

Apple is currently on track to launch its overhauled Siri in September, to run in part on Google’s cloud computing servers using Nvidia chips, according to people familiar with the matter. While Apple will try to run as much as possible of the new Siri on devices such as…

31
r/LocalLLaMA community 28d ago

Big Model Value Wars - DeepSeek V4 Pro vs MiMo-V2.5-Pro vs MiniMax M3

For those who sometimes boost their local model use with openrouter options, or the madlads who have the infrastructure to actually run those locally, it feels like those three model have the edge in best bang for your buck. How then do you decide which one to use? Do you have a…

19
r/LocalLLaMA community 28d ago

Best way to index full Italian Wikipedia for 100% offline RAG in LM Studio?

Hi everyone, I want to set up a 100% offline RAG system using LM Studio and the entire Italian Wikipedia (text-only, no images). My goal is to index the database once so my local LLMs can query it for up-to-date factual knowledge without internet access. Here are my PC specs:…

14
r/LocalLLaMA community 29d ago

Microsoft Aion 1.0 Instruct and Aion 1.0 Plan models!

Microsoft announced 2 new on-device models at Microsoft Build 2026. Aion 1.0 Instruct: efficiency at scale. Aion 1.0 Instruct is our next-generation small language model, smaller, faster and more efficient than our current Windows OS SLM. Designed from the ground up for…

14
r/LocalLLaMA community 29d ago

I Put a Datacenter GPU in My Gaming PC for £200

Hey there! I wrote a blogpost about my experience running local models on a V100 from a newbie perspective and got loads of views outside of reddit, so I thought I'd share it here too!   submitted by   /u/tymscar [link]   [comments]

33
r/LocalLLaMA community 29d ago

What are you using to preprocess pdfs before feeding them to a local model?

I have been running a local setup for document QA and the output quality varies a lot depending on what the pdf looks like when it hits the LLM. clean prose docs are fine but anything with tables or multi column layouts comes out garbled and the model just works with whatever…

37
r/LocalLLaMA community 1mo ago

Ignoring benchmarks, how do the newest local models (gemma 4 31B, 26BA4B, Qwen 3.6) “feel” to you? What do you think they compare to?

I use local ai mainly for creative writing, and benchmarks are a bit iffy on that I feel like. I’d like to compare Gemma mainly to Gemini as I like their writing the best, I do know that qwen 3.6 is amazing but mostly for coding and agentic work. I’d like to ask everyone how the…

30
r/LocalLLaMA community 1mo ago

Replaced Claude with local Qwen3.6-27B in my multi-agent orchestrator for 2 weeks

For two weeks I ran my multi-agent orchestrator entirely on Qwen3.6-27B via Ollama, on a single 3090. The goal: see if a local model could replace Claude as the reasoning layer for the lead/manager/sub-agent loop. Here's where it worked and where it broke. Setup: - RTX 3090,…

13
r/LocalLLaMA community 1mo ago

Man trains local model to detect and kill mosquitos with a laser

Now this is local AI innovation we can all get behind. https://x.com/stevencheng/status/2059836738449854898   submitted by   /u/No_Information9314 [link]   [comments]

37
r/LocalLLaMA community 1mo ago

Stop asking what model to run. There are literally only two.

Can we please ban the daily "I have an RTX 3060, what should I run?" slop threads? It’s not complicated. As of right now, Hugging Face is empty and exactly two local models exist on this entire planet: Qwen 3.6 35b a3b Qwen 3.6 27b That is the entire list. Your specs don’t…

30
arXiv — NLP / Computation & Language research 1mo ago

The Architecture of Errors: From Universal Impossibility to Patch-Local LLM Reliability

arXiv:2605.30628v1 Announce Type: new Abstract: Universal LLM reliability is not a finite-library problem: across all possible tasks, tools, schemas, knowledge sources, and evaluator expectations, new intervention-distinguishable failure modes can appear without bound, so no…

38
arXiv — NLP / Computation & Language research 1mo ago

Translation Analytics for Freelancers II: Benchmarking Local LLMs for Confidential Translation Workflows

arXiv:2605.31452v1 Announce Type: new Abstract: Building on our previous work, this paper develops practical, low-barrier methods for freelance translators and smaller language service providers to evaluate translation technologies using rigorous yet accessible analytic methods.…

23
Hugging Face Daily Papers research 1mo ago

From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

Abstract Multi-step trojan attacks in local LLM agents can bypass existing defenses by embedding malicious prompts across multiple operations, requiring new detection methods like DASGuard for effective protection. AI-generated summary LLM agents are evolving from conversational…

20
r/LocalLLaMA community 1mo ago

Don’t bite me for that question please…

And question is… How you earning money on your local llm setups? (Except coding ofc) I see people spending SO MUCH MONEY on the compute power to run llms locally and many of them saying that their setups already payed themselves or they earning much more (I guess they not mean…

29
r/MachineLearning community 1mo ago

I built mlx-Chronos — a community benchmark leaderboard for local LLM engines on Apple Silicon (oMLX, Rapid-MLX, mlx-lm, Ollama) [P]

Hey! I'm a CS student and I got tired of not being able to compare MLX inference engines properly — every benchmark out there is either made by the engine's own developers, runs on an M3 Ultra nobody has, or just shows tok/s with zero context. So I built mlx-Chronos — a small…

11
r/LocalLLaMA community 1mo ago

what do you use your local llm?

what do you use your local llm for? for me, i run everything on linux and it ends up generating api tokens i can plug into other stuff. on my laptop (and for personal projects), i mostly use it for coding help—then i’ve got an ai agent (not openclaw ) that monitors stock prices…

34
r/LocalLLaMA community 1mo ago

How do I try to run Gemma 4 31B at Q8 quantization? Only seeing Q4_K_M on Ollama

Just got my new PC up and running and want to test some local models. I'm a complete noob but I've managed to install ollama. Im on Fedora Linux.   submitted by   /u/JayoTree [link]   [comments]

9
r/LocalLLaMA community 1mo ago

Cost Analysis of my $6.4k Local LLM Server

I haven't seen any of these done, so I just wanted to share my experience in case it is useful for anyone. The purpose of this post is to show total cost of ownership of my local llm server versus API equivalent. Before you look at the final numbers, note that most people do not…

17
r/LocalLLaMA community 1mo ago

Why does Thinking Output More Tokens Than a Response?

I was too lazy to use a vector DB + Embedding + Clustering for this list of 1000 items I wanted to categorize. I was hoping to use a local LLM to do it, but it would only respond with a list of about 100 items or so and their categories. It confused me because when I saw the…

22
r/LocalLLaMA community 1mo ago

Shoutout to Gemma4 as a conversational assistant / agent

I'm seriously impressed by Gemma4 26B A4B. On my M5 Pro (so not much memory bandwidth by GPU standards), it's blazingly fast and it's a very good generalist / everyday local LLM. It has a little bit of personality to its responses, and seems to perform decently for everything:…

37
r/LocalLLaMA community 1mo ago

LiquidAI/LFM2.5-8B-A1B · Hugging Face

looks like you can run it on any potato (A1B)! https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF from LiquidAI: LFM2.5 is a new family of hybrid models designed for on-device deployment. It builds on the LFM2 architecture with extended pre-training and reinforcement learning.…

22
r/LocalLLaMA community 1mo ago

Qwen3.6 35B - TXT vs Markdown vs HTML vs HTML+CSS

Theres been talk of late about using HTML rather than markdown in Claude Code. I was curious how this worked with a local model so loaded up Qwen3.6 35B A3B at Q8 and F16 KV cache. Then I gave it the same prompt write a detailed explanation of the Blazor render cycle first…

31
The Information — AI news-outlet 1mo ago

Apple to Renew Push for AI That Runs on Devices, Instead of the Cloud

At Apple’s annual developer conference next month, the star of the show will be a series of long-delayed artificial intelligence upgrades to the iPhone. But the company is also expected to emphasize what could be an underrated asset in its efforts to catch up in AI: Its ability…

26
r/LocalLLaMA community 1mo ago

Heterogeneous GPU Weighting & Layer Splitting

This is what I worked on today. With local LLM of course. So if I didn't write the code, did I really work on it? Who cares. It was my idea and I simply asked it to implement it. I basically downloaded /main/ branch, which is totally broken for Windows by the way (i had to…

21
arXiv — Machine Learning research 1mo ago

The Energy Blind Spot: NVIDIA's Flagship Edge AI Hardware Cannot Support Process-Level Energy Attribution

arXiv:2605.27599v1 Announce Type: new Abstract: Agentic AI workloads - where a single user goal triggers multi-step orchestration, tool calls, retries, and failure recovery - are being targeted for edge deployment, with NVIDIA, Dell, HP, ASUS, MSI, Acer, and Gigabyte all…

37
r/LocalLLaMA community 1mo ago

Local LLMs on Refurb M4 Max vs new M5 Max

Hoping the community can guide me on this one. I'm on the fence about the following purchase: Refurbished 16-inch MacBook Pro Apple M4 Max Chip with 16‑Core CPU and 40‑Core GPU, 64gb ram, 1Tb Drv for $3,479.00 vs The new 16-inch MacBook Pro Apple M5 Max Chip with 18‑core CPU,…

30
r/LocalLLaMA community 1mo ago

CrankGPT by Squeez Labs - hand-cranked edge AI - talk about local AI!!!

I met Katrin from Squeez Labs at an event hosted by Pathway AI (the team behind Baby Dragon Hatchling) where she told me about CrankGPT, a literally hand-cranked device for running local LLMs. It's apparently real. It's appearently launched. It's apparently glorious. Check it…

15
r/LocalLLaMA community 1mo ago

Qwen3.6 huge quality gain from Q4 to Q6 for coding agent

So, last week I tried to update my unused local LLM setup. I had to stop using it because quality was too low and deepseek was too cheap. First thing I stopped using Ollama and now I only use llama.cpp built in server that works really great. The quality improvement from Q4 to…

34
arXiv — Machine Learning research 1mo ago

The Constraint Tax: Measuring Validity-Correctness Tradeoffs in Structured Outputs for Small Language Models

arXiv:2605.26128v1 Announce Type: new Abstract: Production LLM systems increasingly require machine-readable outputs: JSON objects, typed traces, regex-constrained fields, and tool-call schemas. This paper targets on-device and low-cost small language model (SLM) deployments,…

24
arXiv — Machine Learning research 1mo ago

Dense2MoE: Pushing the Pareto Frontier of On-Device LLMs via Unified Pruning and Upcycling

arXiv:2605.26496v1 Announce Type: new Abstract: The Mixture of Experts MoE architecture is highly promising for resource constrained on device deployments yet training these models from scratch incurs prohibitive costs Current methods attempt to alleviate this by upcycling dense…

32
Hugging Face Daily Papers research 1mo ago

MobileMoE: Scaling On-Device Mixture of Experts

Abstract MobileMoE introduces efficient on-device Mixture-of-Experts language models with sub-billion parameters that achieve better performance and efficiency compared to dense baselines and existing MoE models. AI-generated summary Mixture-of-Experts (MoE) has become the de…

17
arXiv — Machine Learning research 1mo ago

Signs Beat Floats: Low-Rank Double-Binary Adaptation for On-Device Fine-Tuning

arXiv:2605.24058v1 Announce Type: new Abstract: On-device adaptation of large language models commonly keeps a quantized base model frozen while training and deploying a small, task-specific LoRA adapter. In the unmerged adapter-mode setting, however, the adapter is more than a…

28
r/LocalLLaMA community 1mo ago

New local model reaching near frontier on PII removal at 9 ms CPU inference

Hi all, I've been working on this model to strip sensitive information from computer use data and would love some feedback!   submitted by   /u/louis3195 [link]   [comments]

34
r/LocalLLaMA community 1mo ago

Using Local LLMs for Generating Custom Interactive Recursive Textbooks on the Fly

  submitted by   /u/Ryoiki-Tokuiten [link]   [comments]

28
llama.cpp releases dev-tools 1mo ago

b9315

llama : document that only one on-device state can be saved per sequence ( #23520 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

13

LocalLLaMA post tier list

Friends from the localllama community, if you love local llm, don't participate in the IPO (spaceX, OpenAI, Anthropic)

I bundled a fully local LLM inside my Unity game. No internet, no cloud, no API key. The conversation is the gameplay.

Galaxy Z Fold6 as a local inference node — llama.cpp/Vulkan, homelab telemetry, SHA-256 model verification

Qwen3.6 35B-A3B on a Laptop: My Zero to One Moment

Are local models good enough to replace Claude/Codex solely for simple HTML tasks?

RTX 3090 EBay Pricing is Crazy!!

Best Coding Harness for Qwen3.6 35B?

Experimentation with Qwen 3.6 and Gemma 4 - Guidance needed

AA comparison of the latest local models

OpenLumara - A different kind of AI agent, written from scratch, not vibecoded. Extremely token-efficient, super small system prompt, made for local models. Everything is modular.

Are We Underestimating Small Edge AI Models?[D]

Run (your largest) local models from your iPhone

b9518

I accidentally crippled my 4x RTX 3090 LLM rig with a hidden PCIe 2.0 x4 slot and fixing it doubled Mistral 128B performance

Apple to Launch New Siri in September With Help of Google, Nvidia

Big Model Value Wars - DeepSeek V4 Pro vs MiMo-V2.5-Pro vs MiniMax M3

Best way to index full Italian Wikipedia for 100% offline RAG in LM Studio?

Microsoft Aion 1.0 Instruct and Aion 1.0 Plan models!

I Put a Datacenter GPU in My Gaming PC for £200

What are you using to preprocess pdfs before feeding them to a local model?

Ignoring benchmarks, how do the newest local models (gemma 4 31B, 26BA4B, Qwen 3.6) “feel” to you? What do you think they compare to?

Replaced Claude with local Qwen3.6-27B in my multi-agent orchestrator for 2 weeks

Man trains local model to detect and kill mosquitos with a laser

Stop asking what model to run. There are literally only two.

The Architecture of Errors: From Universal Impossibility to Patch-Local LLM Reliability

Translation Analytics for Freelancers II: Benchmarking Local LLMs for Confidential Translation Workflows

From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

Don’t bite me for that question please…

I built mlx-Chronos — a community benchmark leaderboard for local LLM engines on Apple Silicon (oMLX, Rapid-MLX, mlx-lm, Ollama) [P]

what do you use your local llm?

How do I try to run Gemma 4 31B at Q8 quantization? Only seeing Q4_K_M on Ollama

Cost Analysis of my $6.4k Local LLM Server

Why does Thinking Output More Tokens Than a Response?

Shoutout to Gemma4 as a conversational assistant / agent

LiquidAI/LFM2.5-8B-A1B · Hugging Face

Qwen3.6 35B - TXT vs Markdown vs HTML vs HTML+CSS

Apple to Renew Push for AI That Runs on Devices, Instead of the Cloud

Heterogeneous GPU Weighting & Layer Splitting

The Energy Blind Spot: NVIDIA's Flagship Edge AI Hardware Cannot Support Process-Level Energy Attribution

Local LLMs on Refurb M4 Max vs new M5 Max

CrankGPT by Squeez Labs - hand-cranked edge AI - talk about local AI!!!

Qwen3.6 huge quality gain from Q4 to Q6 for coding agent

The Constraint Tax: Measuring Validity-Correctness Tradeoffs in Structured Outputs for Small Language Models

Dense2MoE: Pushing the Pareto Frontier of On-Device LLMs via Unified Pruning and Upcycling

MobileMoE: Scaling On-Device Mixture of Experts

Signs Beat Floats: Low-Rank Double-Binary Adaptation for On-Device Fine-Tuning

New local model reaching near frontier on PII removal at 9 ms CPU inference

Using Local LLMs for Generating Custom Interactive Recursive Textbooks on the Fly

b9315