Big News for AMD / Strix Halo+ Owners
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Admittedly this is news for me, but I'm hoping it could be of some use to others here as well!
So, THE NPU IS USABLE!!
I've owned an AMD Ryzen 395 Max AI+ (or whatever the naming is lol) for about a year now and have relied solely on GGUFs and Vulkan. I acknowledge that the AMD Ryzen AI team has been working hard to get their ROCm software up to speed w/ their hardware.
https://kyuz0.github.io/amd-strix-halo-toolboxes/
This database did NOT look so ROCm friendly 6 months ago.
Why should I care?
If you own a device w/ both an NPU and a iGPU (like the strix halo series) then you WANT hybrid models. The NPU is CRAZY FAST at PromptProcessing, and can run parallel to gpu firing.Okay, What is Hybrid Mode?
So, LLMs can run through the NPU only. If they're built for it. Check out "FastFlowLM NPU" models for examples that do that. BUT HYBRID mode combines the best of both, and FINALLY utilizes the hardware purchased nearly a year go (for some, more than that).What can i do to test this?
Download Lemonade! Thanks to their efforts that focus primarily on Ryzen AI and working directly w AMD, I've FINALLY got my machine working in ways it couldn't a year ago and Lemonade made it happen. It's GUI is ultra bare-bones and I wouldn't recommend it for any actual agentic/chat/harness usage BUT being able to sanity-test software without investing days or weeks into it?
10/10
Here's the link: lemonade-server.ai
Speaking of links, read more about Hybrid Mode and making your own Hybrid Models here: https://ryzenai.docs.amd.com/en/latest/llm/overview.htmlhttps://ryzenai.docs.amd.com/en/latest/llm/overview.html
---
So, that's it. Just wanted to share. REALLY EXCITED that my year old computer is still advancing in the software science of it all.
I have a single wishlist/request now: MTP-supported Hybrid Models. Qwen 3.6 has that speedup tech introduced by Unsloth, and AMD has a guide for "new processor shapes" since 3.6 GGUF can't simply be "converted to ONNX". Here's that guide: https://ryzenai.docs.amd.com/en/latest/oga_op_prepare.html
If anyone attempts it, please share on huggingface!
This was all written by hand btw, no llm assistance, just passionate dev obsessed w "new shiny".
[link] [comments]
More from r/LocalLLaMA
-
Palantir CEO rages against closed models
Jul 2
-
A cheap trick for reliable structured output: feed the validation error back into the retry
Jul 2
-
SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing.
Jul 2
-
[Benchmark] Kimi K2.7 Code Q3 on Mac Studio M3 Ultra + RTX PRO 6000 over llama.cpp RPC: prefill improves, no changes in token generation/decode
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.