r/MachineLearning · May 26, 2026 · 1 min read

[P] Built a portable GPU ISA after reading too many architecture manuals [P]

Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.

I’ve been reading GPU architecture docs in my free time. NVIDIA PTX, AMD ISA reference guides, Intel Xe, reverse-engineered Apple GPU stuff. Over 5,000 pages across 16 microarchitectures.

After a while you notice all four vendors are doing the same 11 things with different names. So I wrote a spec that covers all of them and built a toolchain around it. It’s called WAVE. You write a kernel once, it compiles to a portable binary, then thin backends translate it to Metal, PTX, HIP, or SYCL.

Same binary verified on Apple M4 Pro, NVIDIA T4, and AMD MI300X. My co-author Onyinye built PyTorch integration and got identical training results across all backends.

Please star on GitHub: https://github.com/Oabraham1/wave
Preprint: https://arxiv.org/abs/2603.28793
Read full docs and how I built everything: https://wave.ojima.me

pip install wave-gpu

submitted by /u/not-your-typical-cs
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/MachineLearning