Neuron Populations Exhibit Divergent Selectivity with Scale [R]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
| Hi! We just released a paper where we study “Rosetta Neurons”: universal neurons across different neural networks, and their relationship to scaling laws, specialization, and monosemanticity. Would love to kick off a discussion and get the community's thoughts. Main Findings: We find that the universal Rosetta Neurons scale as a sublinear power law: larger models have more of them, but they occupy a shrinking fraction of all neurons. They also become more selective/monosemantic and more specialized with scale. We can use a single Rosetta Neuron to filter data for continued pretraining and nearly match oracle data filtering. Paper: https://arxiv.org/abs/2606.03990 Summary thread: https://x.com/_AmilDravid/status/2062959617941074069?s=20 Code: https://github.com/avdravid/rosetta-neuron-scaling Project page: https://avdravid.github.io/rosetta-neuron-scaling/ [link] [comments] |
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.