Fine-tuned Qwen2.5-7B to 96% of Claude Haiku on a domain-specific task using ~$3 of API calls and zero human labelers
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| Built a decision-reasoning engine (Orlog) and wanted to fine-tune a local model for it instead of paying per-call forever. The method (DV-DPO):
Only genuine revisions under adversarial pressure become training signal. Not format preference, not sampling variance. Results:
Autonomous loop now running: GGUF ready for Ollama. Happy to share the pipeline if there's interest. [link] [comments] |
More from r/LocalLLaMA
-
What's in your RAG?
Jul 2
-
Palantir CEO rages against closed models
Jul 2
-
A cheap trick for reliable structured output: feed the validation error back into the retry
Jul 2
-
SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing.
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.