[audio.cpp] The Sound of GGML — C++/GGML native ACE-Step, Stable Audio, HeartMuLa, RoFormer, HTDemucs released. 10-Minute Music in 60 Seconds!
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| I just released a big music/audio expansion in This batch adds music generation, SFX generation, and source separation to the released framework surface: Bonus: HeartMuLa is no longer capped at the old short limit. It can now generate around 10 minutes of audio in one run. Current framework progress: 21 / 28 (75%) This is no longer just “TTS in C++.” Not everything is magically faster yet. HTDemucs is currently slower than the Python path in my test, and Stable Audio warm runs are mixed. I’m not trying to hide that. The current release is about getting the end-to-end paths into the shared framework first, then tightening backend-specific performance. There is a Repo: https://github.com/0xShug0/audio.cpp I’d love feedback from people trying these on different GPUs/CPUs, especially long generations, weird prompts, stem separation quality, backend issues, performance numbers, and anything that breaks. [link] [comments] |
More from r/LocalLLaMA
-
Toolport: Use as many MCP servers as you want without the token tax
Jul 3
-
llamacpp patch - DeepSeek V4 Flash running with full 1M token context locally on RTX 5090
Jul 2
-
Made a new 350M model to compete with lfm2.5 but with an open license
Jul 2
-
Toward Better HIP Kernel Generation for AMD GPUs: Synthetic Data, Multi-Agent Search, and Reinforcement Learning
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.