How many of you do use Q1 or Q2 of Big models(100-250B)? How's it?
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Sharing popular(also recent) models for reference:
151-250B :
- DeepSeek-V4-Flash
- Step-3.X-Flash
- Command-a-plus-05-2026
- Laguna-M.1
- MiniMax-M2.X
- Qwen3-235B-A22B
100-150B :
- GLM-4.5-Air
- Qwen3.5-122B-A10B
- NVIDIA-Nemotron-3-Super-120B-A12B
- Mistral-Small-4-119B-2603
- Devstral-2-123B-Instruct-2512
- Mistral-Medium-3.5-128B
- Llama-4-Scout-17B-16E-Instruct (Yay! got your attention)
<100B :
- Llama-3.3-70B-Instruct
- Qwen3-Coder-Next
- Qwen3-Next-80B-A3B
I see that some people do use Q3(even up to IQ3_XXS) whenever they couldn't run Q4 on their rig. Ex: Noticed that some DGX/SH users do use Q3 of MiniMax-M2 models as Q4 is so tight.
I guess Q1/Q2 won't be good for small/medium size models(~40B size) .... Talking about Agentic coding level. Chatting would be semi-usable quality-wise I think, though I'm not sure.
But I believe it's totally opposite for Big/Large models due to bigger size of the models. So how many of you do use Q1 or Q2 of Big models(100-250B)? How's it & are those enough for you now? Please share your feedback on both Agentic coding, Writing & Chatting stuffs with such quants of those above models. Also please let us know what issues are you facing with Q1/Q2 quants? Ex: Looping issues, Repetition issues, Tool calling issues, etc.,
Personally I don't go below Q4 of small/medium models even though I have only 8GB VRAM on my current laptop. My upcoming rig comes with 96GB VRAM + 128GB RAM so posted this thread. Thought of trying Q1/Q2 of models like NVIDIA-Nemotron-3-Ultra-550B-A55B, GLM-5.X, etc.,
[link] [comments]
More from r/LocalLLaMA
-
What's in your RAG?
Jul 2
-
Palantir CEO rages against closed models
Jul 2
-
A cheap trick for reliable structured output: feed the validation error back into the retry
Jul 2
-
SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing.
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.