Donate your coding sessions to an open CC-BY-4.0 dataset to help train open-weight and open source models
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| Anthropic and Open AI are getting so much data from the Claude Code and Codex usage, and I'm quite scared this will create an oligopoly because only their models will be trained on it, leaving the open-weight and open source models behind. So I'm trying to launch a little initiative called Trace Commons and encouraging people around to donate their coding agent traces into an open dataset https://trace-commons-web.hf.space/ so that other model labs can also train on them Let me know if you have any feedback and hopefully we can have a nice open dataset soon ! [link] [comments] |
More from r/LocalLLaMA
-
What's in your RAG?
Jul 2
-
Palantir CEO rages against closed models
Jul 2
-
A cheap trick for reliable structured output: feed the validation error back into the retry
Jul 2
-
SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing.
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.