How are production ML systems typically handling distribution shift over time? [D]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
In deployed ML systems, data distribution drift seems unavoidable over longer time horizons.
I’m trying to understand what approaches are commonly used in practice:
- Continuous retraining pipelines (fixed intervals vs trigger-based)
- Online monitoring for feature or prediction drift
- Use of shadow models or fallback models in production
- Human-in-the-loop review for edge cases
In most real deployments I’ve seen discussed, retraining strategy seems more operationally constrained than model-related.
Curious what approaches are actually working reliably in production environments and what tends to fail first.
[link] [comments]
More from r/MachineLearning
-
How papers are selected for Best Paper, Oral, or Highlight presentation at major ML/CV conferences such as CVPR, ICCV, ECCV, NeurIPS, and ICLR? [D]
Jul 2
-
BMVC 2026 Review Discussion Thread [D]
Jul 2
-
Has anyone tried this approach with Fast Byte Latent Transformers ? [R]
Jul 2
-
Books/Resources to improve mathematical foundations for ML research [D]
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.