Shortcuts in the Tail: Debiasing via Post-Hoc Spectral Compression of Fine-Tuning Updates
Mirrored from arXiv — Machine Learning for archival readability. Support the source by reading on the original site.
Computer Science > Machine Learning
Title:Shortcuts in the Tail: Debiasing via Post-Hoc Spectral Compression of Fine-Tuning Updates
Abstract:Fine-tuning often introduces spurious correlations alongside task knowledge, causing systematic failures on underrepresented groups. Existing mitigations require retraining, group labels, or curated counterfactual data. We show a simple post-hoc intervention reduces shortcut reliance without any of these: truncating the tail of the SVD of $\Delta W = W_\mathrm{ft} - W_\mathrm{base}$ reduces the spurious-group gap while preserving task accuracy. Across three instruction-tuned models ($0.5$B--$7$B) and four classification benchmarks, top-$k$ truncation reduces the gap on every cell at $<2$ pp accuracy loss, by up to $5\times$ on CivilComments. We propose this works because the shortcut response sits in the tail of the singular ordering of $\Delta W$, a claim about how truncation behaves rather than about the raw singular values, which are broadly distributed and look the same across all four datasets. A controlled boundary case in which fine-tuning has only a shortcut to learn shows the predicted FT-to-base collapse, and bottom-/random-$k$ and matched-rank LoRA controls rule out generic low-rank approximation and rank-constrained training as the explanation. We read this as preliminary evidence that the singular basis of $\Delta W$ is a useful coordinate system for studying what fine-tuning has learned.
| Comments: | ICML Weight Space Symmetries Workshop 2026 |
| Subjects: | Machine Learning (cs.LG) |
| Cite as: | arXiv:2606.07596 [cs.LG] |
| (or arXiv:2606.07596v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2606.07596
arXiv-issued DOI via DataCite (pending registration)
|
Access Paper:
- View PDF
- HTML (experimental)
- TeX Source
References & Citations
Bibliographic and Citation Tools
Code, Data and Media Associated with this Article
Demos
Recommenders and Search Tools
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
More from arXiv — Machine Learning
-
Representation as a Bottleneck for Mechanistic Interpretability: The Manifestation Unit Protocol
Jul 2
-
SNAP-FM: Sparse Nonlinear Accelerated Projection for Physics-Constrained Generative Modeling
Jul 2
-
SemiScope: Disentangling Classifier Tuning and Joint Optimization in Semi-Supervised Security Classification
Jul 2
-
A Filtered Mixture-of-Generators for Fully Synthetic Survival Training
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.