I built a tool to turn your Claude Code sessions into fine-tuning data for local models
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
If you use Claude Code, every session is already sitting on disk as a .jsonl file under ~/.claude/projects/. It has real coding conversations: multi-turn edits, tool calls, reasoning traces. That's training data you already generated for free.
The problem is the format is not what any fine-tuning framework expects. So I built claude_converter to bridge that gap.
What it does:
- Converts Claude Code
.jsonlsessions into themessagesformat thatapply_chat_template()consumes directly - Outputs are compatible with TRL/SFTTrainer, Axolotl, and LLaMA-Factory (sharegpt format)
- Ships a
clean_messages()helper to strip<tool_use>,<tool_result>, and<thinking>blocks before training - Includes an
inspect_session()CLI-style function with token counts and block breakdowns so you know what you're working with before you train on it - Zero dependencies
Quick example:
```python import glob from datasets import Dataset from trl import SFTTrainer, SFTConfig from claude_converter import session_to_messages, clean_messages
all_messages = [] for path in glob.glob("~/.claude/projects/*/.jsonl", recursive=True): msgs = clean_messages(session_to_messages(path)) if len(msgs) >= 2: all_messages.append({"messages": msgs})
dataset = Dataset.from_list(all_messages) ```
One caveat worth calling out: raw sessions include failed attempts, retries, and dead ends. Don't train on everything blindly. Filter to sessions where the final assistant turn actually solved the problem.
Repo: https://github.com/FredyRivera-dev/claude_converter
uv pip install claude-converter
Happy to answer questions about the format or the conversion logic.
[link] [comments]
More from r/LocalLLaMA
-
Palantir CEO rages against closed models
Jul 2
-
SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing.
Jul 2
-
[Benchmark] Kimi K2.7 Code Q3 on Mac Studio M3 Ultra + RTX PRO 6000 over llama.cpp RPC: prefill improves, no changes in token generation/decode
Jul 2
-
They fit! Mostly.... 2x 3090, Thermaltake Core p3
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.