Mellum2 local deployments
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| Hey local community, I work at JetBrains with the team that trained Mellum2 models — 12B-2.5A LLMs. Those models are trained completely from scratch, targeting fast inference: our primary goal were H100/H200s prod deployments, but local deployments are good as well. We open-sourced few checkpoints on HF earlier this month and also published full technical report on arxiv. Our benchmarks show that we work as well as other small language models (SLMs), but provide significantly higher throughput under concurrent load (pic attached). Various GGUFs are now available on ollama and HF as well, and we really would like to hear your feedback. What works well for you, what doesn't? What are your expectations from such small models, and do we meet those? What's your hardware setup, and is this model useful for you? [link] [comments] |
More from r/LocalLLaMA
-
What's in your RAG?
Jul 2
-
Palantir CEO rages against closed models
Jul 2
-
A cheap trick for reliable structured output: feed the validation error back into the retry
Jul 2
-
SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing.
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.