Pay attention: a few chats waiting in tray reserve 1GB VRAM for themselves.
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
If an application uses a Web-based interface and "hardware acceleration", it constructs its frame in VRAM and sometimes keeps it reserved even if the app is minimised.
On my Linux machine, Discord is the worst offender, reserving 450 MB VRAM. Steam takes 200 MB, Telegram 150 MB, and a few other apps top it up to 1 GB+.
If you are really squeezing something into VRAM, make sure to either close those apps or turn off "hardware acceleration" in their settings. But they would stutter a lot.
Also, it may make sense to have another browser with hardware acceleration turned off, and use it only when working with an LLM.
P.S. On Linux with Nvidia, I can get a list of VRAM gobblers with the command nvidia-smi.
[link] [comments]
More from r/LocalLLaMA
-
Follow-up: GLM-5.2 NVFP4 on four DGX Sparks — the MTP mystery is solved, and it's now ~24 tok/s at 128K context
Jul 3
-
Toolport: Use as many MCP servers as you want without the token tax
Jul 3
-
[audio.cpp] The Sound of GGML — C++/GGML native ACE-Step, Stable Audio, HeartMuLa, RoFormer, HTDemucs released. 10-Minute Music in 60 Seconds!
Jul 3
-
llamacpp patch - DeepSeek V4 Flash running with full 1M token context locally on RTX 5090
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.