DeepSpec - a deepseek-ai Collection
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| DeepSpecDeepSpec is a full-stack codebase for training and evaluating draft models for speculative decoding. It contains data preparation utilities, draft model implementations, training code, and evaluation scripts. Released CheckpointsThe checkpoints below are the ones used for Table 1 in the paper. Each checkpoint was trained on open-perfectblend data generated by its corresponding target model in non-thinking mode, and is the direct output of the corresponding training configuration under config/.
If you cite these results in a new paper, align your setup with the training settings in this repository; otherwise, the comparison is not meaningful. For domain-specific use, fine-tune the draft model again for better results, especially if the target model is expected to run in thinking mode. Supported AlgorithmsCurrently, DeepSpec includes three draft models: DSpark, DFlash and Eagle3. HuggingFace : https://huggingface.co/collections/deepseek-ai/deepspec [link] [comments] |
More from r/LocalLLaMA
-
Palantir CEO rages against closed models
Jul 2
-
SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing.
Jul 2
-
[Benchmark] Kimi K2.7 Code Q3 on Mac Studio M3 Ultra + RTX PRO 6000 over llama.cpp RPC: prefill improves, no changes in token generation/decode
Jul 2
-
They fit! Mostly.... 2x 3090, Thermaltake Core p3
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.