r/MachineLearning
500 articles archived · Visit source ↗ · RSS
-
r/MachineLearning community 10d ago
EMA on LoRA ? [R]
Hi guys Does anyone know of papers where EMA on LoRA adapters has been used successfully? Im interested in cases where the EMA adapter acts as a self-teacher generating soft labels for the trainable adapter. On-policy self-distillation [1] uses ema for the teacher. However, they…
20 -
r/MachineLearning community 10d ago
A slightly improved DVD-JEPA demo [P]
Hey! I came across this post , which I found quite neat as a minimal demonstration of JEPA. However, as the comments pointed out, there was some room for improvement. So I added a few things such as environment noise and a fair* comparison to a pixel-space baseline. I think the…
19 -
r/MachineLearning community 11d ago
I released a softmax-free attention model at GPT-2 Medium scale (~354M params, 11.5B tokens): structural sparsity + tile-skipping kernels for long-context VRAM savings. Open weights + custom Triton kernels [R]
  submitted by   /u/NonGameCatharsis [link]   [comments]
29 -
r/MachineLearning community 11d ago
Looking for an ML/data collaborator — open to any project idea [p]
I want to team up on a ML project, no fixed idea yet. Open to whatever's interesting: NLP, CV, time series, whatever you're into. Looking for: anyone with an idea (Or without, we can think about something togther) + ML engineer to build it with Goal: my goal is to strengthen my…
33 -
-
r/MachineLearning community 11d ago
TSAuditor: A time-series auditing framework [P]
This happened a few months ago when I was working on an analysis project that dealt with time-series data. The dataset was large (10 years of data). I was using a standard profiling tool to check the pipeline. Everything looked fine because the tool reported 3% missing data rate…
29 -
-
-
r/MachineLearning community 11d ago
Would you let an ML PhD student graduate without a top-tier paper? [D]
Suppose you’re a PhD advisor in machine learning. Your student has been in the program for 4 years, has done solid work, and has a coherent thesis direction but they haven’t published in an A*ML venue or top journal. No NeurIPS/ICML/ICLR/CVPR/etc., and no equivalent top venue in…
11 -
r/MachineLearning community 12d ago
DVD-JEPA: an open-source, fully-reproducible JEPA world model [P]
A paper currently trending on paperswithcode.co in the "Anomaly Detection" category is DVD-JEPA . https://i.redd.it/r6fd8n3d4f8h1.gif Here is the short summary: Most attempts to learn a world model from video try to predict the next frame pixel-by-pixel, and drown in detail that…
11 -
r/MachineLearning community 12d ago
Time Series Modeling Needs a Dynamical Systems Perspective [R]
In our #ICML2026 position paper we argue a dynamical systems perspective is needed to drive time series (TS) modeling forward: https://arxiv.org/abs/2602.16864 Essentially all time series in nature and engineering come from some underlying dynamical system (DS), mostly chaotic…
31 -
r/MachineLearning community 12d ago
Built a Global AQ (PM2.5) Forecaster ML Model [P]
Hey everyone, I’ve been building an end-to-end Air Quality (PM2.5) forecasting pipeline for 4 countries (US, UK, India, Australia) using 1.6M+ rows of OpenAQ and NASA weather data. The problem i hit (the variance trap): My V7 model was a standard stateless Gradient Boosting…
23 -
r/MachineLearning community 12d ago
Best library for releasing my research optimization algorithm? [D]
Hi All! I have developed a research optimizer (QQN Quadratic Quasi-Newton) and published a paper on it where I am able to, but I would really like to make the algorithm itself easily available to the community for evaluation. I have a Rust, Java, and Javascript implementations,…
36 -
r/MachineLearning community 13d ago
Neuron Populations Exhibit Divergent Selectivity with Scale [R]
Hi! We just released a paper where we study “Rosetta Neurons”: universal neurons across different neural networks, and their relationship to scaling laws, specialization, and monosemanticity. Would love to kick off a discussion and get the community's thoughts. Main Findings: We…
11 -
r/MachineLearning community 13d ago
Latent space interpretation [R]
Hi all, I have trained a convolutional autoencoder on a set of medical images. Further classified latent feature maps using random forest to find the top scoring feature map. Now my goal is to understand which input image is captured in top scoring latent feature map. Any…
29 -
r/MachineLearning community 14d ago
Is ACL now irrelevant? [D]
I just read in a comment of another Post that an ACL paper is considered a weak signal in the community apparently, and having an ACL first author paper is not a great plus for improving chances at finding a PhD position. Is this some kind of ragebait or is academia becoming…
26 -
r/MachineLearning community 14d ago
Any idea if AAAI will be harsh on computer vision paper as last year? [R]
Hello everyone, I have a computer vision paper ready for submission, a coauthor have suggested submitting it to AAAI. However last year computer vision papers have gotten a very small acceptance rate at AAAI, with reviewers receiving emails to specifically tell them that the…
17 -
r/MachineLearning community 14d ago
How hard is it to break into ML work without a Master's degree? [D]
I'm currently a software engineer (mostly mobile/iOS development) and have recently started learning machine learning because I genuinely find it interesting, especially the math behind it. I have a fairly strong math background and am comfortable with calculus, probability, and…
22 -
r/MachineLearning community 14d ago
Multivariate Probability Models in Machine Learning [D]
Hello Folks, we start our discussion on Lecture 10 of Probabilistic Machine Learning, now starting with Probability Multivariate Models. Univariate models are toy cases, in real life, ML models are multivariate. To understand dependence of more than one variables on each other…
32 -
r/MachineLearning community 14d ago
Should I accept job offer or do my master's? [D]
I graduated with my bachelor's in a top 3 CS program and have had a rough recruiting season. I received a full time offer as AI Product Engineer at a tax software company, where they are trying to become more AI native. It's essentially a PM + AI engineering role. Long term I'd…
8 -
r/MachineLearning community 14d ago
How do you analyze the relative "strength" of probes? [R]
This question is related to topics like language+ models (including multimodal) and things like "circuit" analyses. I think something related might come up in my work (factuality guarantees for model outputs) and I'm trying to orient to the SoTA. I found this old post on trying…
21 -
r/MachineLearning community 14d ago
No CVPRW report [D]
I participated in Denoising Challenge (gaussian noise level 50), managed to get a decent rank and was looking forward to cite the report in my CV etc, but it seems like the organiser is not planning to release the report, cant see any entry on open access NTIRE page, is the…
25 -
-
r/MachineLearning community 14d ago
ICML (DL4C) Accepted ( Few queries ) [D]
Just got the email that I have been accepted in DL4C @ICML 2026 , as the email did not contain any details on logistics can someone help here ​ - is it mandatory to visit the workshop ? - what's the usual expense apart from flights, can someone add details like fees and…
9 -
r/MachineLearning community 15d ago
Next-Latent Prediction Transformers [R]
Microsoft Research Preprint Next-token prediction is myopic. What if transformers learn to predict their own next latent state? Microsoft Research present Next-Latent Prediction (NextLat) : a self-supervised learning method that teaches transformers to form compact world models…
27 -
r/MachineLearning community 15d ago
What is Speculative Decoding? (trending on paperswithco.de) [R]
A method that is currently trending on Papers with Code is Speculative Decoding. https://preview.redd.it/dm4nh4t71o7h1.png?width=3082&format=png&auto=webp&s=b6468668667d4bcfb6c9248d3af7fd09f21fe0da Speculative decoding is an inference optimization technique that uses a fast,…
19 -
r/MachineLearning community 15d ago
[ECCV 2026] Final Decisions [D]
ECCV 2026 final decisions are expected to be released on June 17, 2026 . Since there was no exact release time specified, results will likely roll out within 48 hours. This thread is for everyone to share updates, discuss outcomes, and support each other through the decisions.…
26 -
r/MachineLearning community 16d ago
Source code for LLMs. [D]
I was digging through Hugging Face’s Transformers repo and found https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt_oss/modeling_gpt_oss.py From what I can tell, this isn’t just boilerplate, it looks like a full implementation. is it actually the…
15 -
r/MachineLearning community 16d ago
How the brains learn [R]
Abstract: A sufficient account of how the neocortex learns must meet three criteria: Computationally, it must approximate a powerful, general-purpose learning algorithm known to scale to human-level intelligence; Algorithmically, it must be implementable using known,…
9 -
r/MachineLearning community 16d ago
Cleo: trying to fit full analyst behavior in a 2B model [P]
Hello all! Half of all industrial "chatbots" are just text-to-SQL models in a trenchcoat (and the other half RAG!). I wanted to explore just how small you could make these models if you trained, evaluated, and ran inference in the exact same structured harness, leading to Cleo:…
4