EMA on LoRA ? [R]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
Hi guys
Does anyone know of papers where EMA on LoRA adapters has been used successfully?
Im interested in cases where the EMA adapter acts as a self-teacher generating soft labels for the trainable adapter.
On-policy self-distillation [1] uses ema for the teacher. However, they seem to fully fine-tune. Any empirical results showing the idea is working on lora/ left models?
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.