r/MachineLearning · June 21, 2026 · 1 min read

EMA on LoRA ? [R]

Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.

Hi guys

Does anyone know of papers where EMA on LoRA adapters has been used successfully?

Im interested in cases where the EMA adapter acts as a self-teacher generating soft labels for the trainable adapter.

On-policy self-distillation [1] uses ema for the teacher. However, they seem to fully fine-tune. Any empirical results showing the idea is working on lora/ left models?

[1] https://arxiv.org/abs/2601.19897

submitted by /u/South-Conference-395
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/MachineLearning