r/LocalLLaMA · July 2, 2026 · 1 min read

Talking with Gemma 4 31B!

#model-release #voice #open-source #gpu #security

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Hi! I'm Andi from Hugging Face. This is a fully open-source and free to test/pull/modify demo I'm bringing today.

It's a voice demo creating a pipeline of:
- Nvidia's parakeet
- Gemma 4 31B (served by cerebras!)
- My custom inference for Qwen3TTS

It sees and searches the web faster than you blink.

The whole stack is fully open-source, and is a drop-in replacement for OpenAI's realtime API. You can run it locally, I get similar latencies with a macbook pro M3 36GB and Gemma 4 E4B.

Here to the web based demo featured in the video, everything is running in the cloud.

For those who have been following, yes, this is the pipeline that runs on reachy minis :)

submitted by /u/futterneid
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA