Made a new 350M model to compete with lfm2.5 but with an open license
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| I liked the idea of a nano llm, but decided to actually challenge myself with developing one. Keep in mind I developed this model, do your own research and your own benchmarks. It has been a while since I posted on this subreddit, been busy getting better. trained, fine tuned, generated data for many llms though unreleased as they were unsatisfactory. 2.0 is not from scratch, though working on doing that too. In the screenshots I accidentally locally saved it as fijik2.5! The model is the same one as the one uploaded on HF in bf16. My apologies. Been working and can finally release Fijik 2.0 350m, based off of granite 4 350M, continually pre trained on ~6B tokens with an Aug 2025 knowledg cutoff, then post trained on a custom sft corpus with mixed reasoning efforts. Also, I've included some samples of outputs from the model compared to lfm2.5. Keep in mind, you should use it with web search or similar, you can't have much knowledge at 350M parameters. Basically, lfm2.5 is awesome truly, but I don't like the custom license, fijik uses apache-2.0, and unlike my previous model(s) I actually benchmarked it. Benchmarks are available on the HF readme! If you have any questions feel free to ask, worked pretty hard on it and honestly, I'm pleased. Safetensors: https://huggingface.co/Pinkstack/fijik-2.0-350m-sft GGUF: https://huggingface.co/Pinkstack/fijik-2.0-350m-sft-GGUF (running below bf16 is not recommended, you may need to set the chat format manually in lm-studio and alike, the model does NOT use standard chatml and will not work with chatml.) Have a good one. Once again if you have questions feel free to reach out. <3 [link] [comments] |
More from r/LocalLLaMA
-
llamacpp patch - DeepSeek V4 Flash running with full 1M token context locally on RTX 5090
Jul 2
-
Tip: use this llama.cpp PR to improve PP on Intel ARC
Jul 2
-
Local benchmarks with a RTX 3090 - Qwen3.6 27b vs Ornith
Jul 2
-
July 4th is coming up, is there any vision model that's good for picking up fire?
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.