README_EN.md · openpangu/openPangu-2.0-Flash at main
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| 1. IntroductionopenPangu-2.0-Flash is an MoE model trained on Ascend. The model has 92B total parameters and 6B activated parameters. Its context length is 512k. The total pretraining data contains 34T tokens. During Post-training, openPangu-2.0-Flash is trained through unified SFT with slow and fast thinking capability, multiple specialist RL traning, on-policy distillation combining multiple RL specialists. 2. ArchitectureopenPangu-2.0-Flash brings several major architectural improvements:
[link] [comments] |
More from r/LocalLLaMA
-
Palantir CEO rages against closed models
Jul 2
-
SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing.
Jul 2
-
[Benchmark] Kimi K2.7 Code Q3 on Mac Studio M3 Ultra + RTX PRO 6000 over llama.cpp RPC: prefill improves, no changes in token generation/decode
Jul 2
-
They fit! Mostly.... 2x 3090, Thermaltake Core p3
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.