r/MachineLearning

500 articles archived · Visit source ↗ · RSS

r/MachineLearning community 10d ago

Best current methods for finetuning whisper on domain specific vocabulary? [P]

Hey everyone, I’m wondering whether there are any newer or more effective methods for fine tuning whisper on domain specific speech. I’m working on a project where the model needs to reliably detect certain specific words and technical terms. The vocabulary and context are…

4
r/MachineLearning community 10d ago

EMA on LoRA ? [R]

Hi guys Does anyone know of papers where EMA on LoRA adapters has been used successfully? Im interested in cases where the EMA adapter acts as a self-teacher generating soft labels for the trainable adapter. On-policy self-distillation [1] uses ema for the teacher. However, they…

20
r/MachineLearning community 10d ago

A slightly improved DVD-JEPA demo [P]

Hey! I came across this post , which I found quite neat as a minimal demonstration of JEPA. However, as the comments pointed out, there was some room for improvement. So I added a few things such as environment noise and a fair* comparison to a pixel-space baseline. I think the…

19
r/MachineLearning community 11d ago

I released a softmax-free attention model at GPT-2 Medium scale (~354M params, 11.5B tokens): structural sparsity + tile-skipping kernels for long-context VRAM savings. Open weights + custom Triton kernels [R]

  submitted by   /u/NonGameCatharsis [link]   [comments]

29
r/MachineLearning community 11d ago

Looking for an ML/data collaborator — open to any project idea [p]

I want to team up on a ML project, no fixed idea yet. Open to whatever's interesting: NLP, CV, time series, whatever you're into. Looking for: anyone with an idea (Or without, we can think about something togther) + ML engineer to build it with Goal: my goal is to strengthen my…

33
r/MachineLearning community 11d ago

Python packages for particle swarms, genetic algorithms. Scikit-opt maybe? [D]

I'm working with a client on a curve-fitting optimization problem. They are currently using a constrained Levenburg-Marquardt optimizer for their task which is complex, slow, and sometimes gets stuck in local minima. I suggested using particle swarm optimization (PSO), and the…

17
r/MachineLearning community 11d ago

Studying FLUX in diffusers library was hard, so I built a smaller open-source version [P]

If you've tried to study modern diffusion models by digging through the official diffusers library, you know it can be overwhelming with its complexity and abstractions. I wanted to simplify FLUX diffusion models, so I built minFLUX : a PyTorch implementation focused on its core…

38
r/MachineLearning community 11d ago

TSAuditor: A time-series auditing framework [P]

This happened a few months ago when I was working on an analysis project that dealt with time-series data. The dataset was large (10 years of data). I was using a standard profiling tool to check the pipeline. Everything looked fine because the tool reported 3% missing data rate…

29
r/MachineLearning community 11d ago

American businesses are using Chinese AI again? [N]

https://econlab.substack.com/p/top-saas-vendors-on-ramp-june-2026   submitted by   /u/NoVillage8460 [link]   [comments]

36
r/MachineLearning community 11d ago

Hi Reddit, I posted my Build Your Own LLM workshop to Youtube teaching ML, LLM and math intuition [P]

Hi internet friends, I recorded a workshop about building your own LLM without any math / ML prerequisites. It covers everything from machine learning fundamentals, deep neural networks, transformer architecture, and pre/post-training. The only prerequisite is being comfortable…

5
r/MachineLearning community 11d ago

Would you let an ML PhD student graduate without a top-tier paper? [D]

Suppose you’re a PhD advisor in machine learning. Your student has been in the program for 4 years, has done solid work, and has a coherent thesis direction but they haven’t published in an A*ML venue or top journal. No NeurIPS/ICML/ICLR/CVPR/etc., and no equivalent top venue in…

11
r/MachineLearning community 11d ago

An open handbook on LLM inference at scale (GPU internals, KV cache, batching, vLLM/SGLang/TensorRT-LLM) [P]

I've been working through the internals of LLM inference and writing up what I learn as an open, in-progress handbook. Just wrapped another chapter on GPU execution and memory internals: why a GPU sits mostly idle during inference, how the memory hierarchy gates throughput, and…

13
r/MachineLearning community 12d ago

DVD-JEPA: an open-source, fully-reproducible JEPA world model [P]

A paper currently trending on paperswithcode.co in the "Anomaly Detection" category is DVD-JEPA . https://i.redd.it/r6fd8n3d4f8h1.gif Here is the short summary: Most attempts to learn a world model from video try to predict the next frame pixel-by-pixel, and drown in detail that…

11
r/MachineLearning community 12d ago

Time Series Modeling Needs a Dynamical Systems Perspective [R]

In our #ICML2026 position paper we argue a dynamical systems perspective is needed to drive time series (TS) modeling forward: https://arxiv.org/abs/2602.16864 Essentially all time series in nature and engineering come from some underlying dynamical system (DS), mostly chaotic…

31
r/MachineLearning community 12d ago

Built a Global AQ (PM2.5) Forecaster ML Model [P]

Hey everyone, I’ve been building an end-to-end Air Quality (PM2.5) forecasting pipeline for 4 countries (US, UK, India, Australia) using 1.6M+ rows of OpenAQ and NASA weather data. The problem i hit (the variance trap): My V7 model was a standard stateless Gradient Boosting…

23
r/MachineLearning community 12d ago

how to access books3 dataset for research purposes? [R]

as per the title, how to access books3 dataset for research purposes?   submitted by   /u/xolmnyc [link]   [comments]

17
r/MachineLearning community 12d ago

Top notch best modern Probability or Statistics Books to get started with ML? [D]

Recommend some of the best modern books about probability and/or statistics to help you get that probability intuition or mindset needed to excel at ML, from beginning to advanced or separated please!   submitted by   /u/c_carav_io [link]   [comments]

9
r/MachineLearning community 12d ago

Built a local ML pipeline that blocks risky commits before they leave your machine [P]

I'm a recent CS grad trying to break into ML engineering, and I just finished the first version of a side project I've been working on. Posting it here because I want people who know this space better than me to poke holes in it. The idea started from that feeling every dev has…

4
r/MachineLearning community 12d ago

Dealing with a messy prescriptive monolith. How do you survive this? [D]

Months ago, I got my first maintenance project. Before this, I had only built new solutions from scratch and maintained my own code. But maintaining someone else's system feels completely different.  It’s a prescriptive recommendation system that uses XGBoost models and…

20
r/MachineLearning community 12d ago

Best library for releasing my research optimization algorithm? [D]

Hi All! I have developed a research optimizer (QQN Quadratic Quasi-Newton) and published a paper on it where I am able to, but I would really like to make the algorithm itself easily available to the community for evaluation. I have a Rust, Java, and Javascript implementations,…

36
r/MachineLearning community 12d ago

How does torch.compile() achieve massive speedups despite highly optimized NumPy functions? [D]

I was pondering on this question and decided to dive deep into torch.compile. It was a lot of fun learning about operator fusion as the central idea behind torch.compile. So I created a tiny version of torch.compile in 500 lines of python and a notebook showing how this works:…

8
r/MachineLearning community 13d ago

Fearless Concurrency on the GPU: Safe GPU inference in Rust, competitive with vLLM/SGLang [R]

I maintain cuTile Rust and just posted the paper "Fearless Concurrency on the GPU." As more GPU code gets AI-generated, the bottleneck moves from writing it to trusting it. cuTile Rust lets you write or generate GPU kernels whose memory safety and data-race freedom are verified…

29
r/MachineLearning community 13d ago

Neuron Populations Exhibit Divergent Selectivity with Scale [R]

Hi! We just released a paper where we study “Rosetta Neurons”: universal neurons across different neural networks, and their relationship to scaling laws, specialization, and monosemanticity. Would love to kick off a discussion and get the community's thoughts. Main Findings: We…

11
r/MachineLearning community 13d ago

Latent space interpretation [R]

Hi all, I have trained a convolutional autoencoder on a set of medical images. Further classified latent feature maps using random forest to find the top scoring feature map. Now my goal is to understand which input image is captured in top scoring latent feature map. Any…

29
r/MachineLearning community 13d ago

Voice debugging at the conversation level seems far more useful than isolated benchmark metrics [D]

I have been thinking a lot about how poorly isolated benchmark metrics capture real conversational system quality once models are deployed into multi-turn environments. You can have strong STT scores, decent latency, high task completion rates, and still end up with…

25
r/MachineLearning community 13d ago

HELP WITH RESEARCH: Observation - Semantically Dense Context Produces Strong Late-Layer Divergence Without Jailbreak Prompts [D]

TL;DR for ML Specialists: The Core: An empirical study on how long, semantically dense, completely benign text (with zero triggers, instructions, or jailbreak prompts) drives an implicit shift in the model's latent space trajectories. The Effect: Dilution of the initial system…

24
r/MachineLearning community 14d ago

Is ACL now irrelevant? [D]

I just read in a comment of another Post that an ACL paper is considered a weak signal in the community apparently, and having an ACL first author paper is not a great plus for improving chances at finding a PhD position. Is this some kind of ragebait or is academia becoming…

26
r/MachineLearning community 14d ago

Any idea if AAAI will be harsh on computer vision paper as last year? [R]

Hello everyone, I have a computer vision paper ready for submission, a coauthor have suggested submitting it to AAAI. However last year computer vision papers have gotten a very small acceptance rate at AAAI, with reviewers receiving emails to specifically tell them that the…

17
r/MachineLearning community 14d ago

How hard is it to break into ML work without a Master's degree? [D]

I'm currently a software engineer (mostly mobile/iOS development) and have recently started learning machine learning because I genuinely find it interesting, especially the math behind it. I have a fairly strong math background and am comfortable with calculus, probability, and…

22
r/MachineLearning community 14d ago

Multivariate Probability Models in Machine Learning [D]

Hello Folks, we start our discussion on Lecture 10 of Probabilistic Machine Learning, now starting with Probability Multivariate Models. Univariate models are toy cases, in real life, ML models are multivariate. To understand dependence of more than one variables on each other…

32
r/MachineLearning community 14d ago

What does provisional paper acceptance mean in ECCV? Is that the default message everyone gets? [D]

What does provisional paper acceptance mean in ECCV? Is that the default message everyone gets?   submitted by   /u/NotGondor [link]   [comments]

37
r/MachineLearning community 14d ago

Should I accept job offer or do my master's? [D]

I graduated with my bachelor's in a top 3 CS program and have had a rough recruiting season. I received a full time offer as AI Product Engineer at a tax software company, where they are trying to become more AI native. It's essentially a PM + AI engineering role. Long term I'd…

8
r/MachineLearning community 14d ago

How do you analyze the relative "strength" of probes? [R]

This question is related to topics like language+ models (including multimodal) and things like "circuit" analyses. I think something related might come up in my work (factuality guarantees for model outputs) and I'm trying to orient to the SoTA. I found this old post on trying…

21
r/MachineLearning community 14d ago

Is foundational AI research still something that can be done without access to HPC? [D]

I'm not that well versed in ML yet. I know that "Attention is all you need" was based on work that was done with a couple of high end gaming GPUs at the time. I can afford that. Suppose for arguments sake that I have caught up on ML such that I have the competence to recreate…

34
r/MachineLearning community 14d ago

No CVPRW report [D]

I participated in Denoising Challenge (gaussian noise level 50), managed to get a decent rank and was looking forward to cite the report in my CV etc, but it seems like the organiser is not planning to release the report, cant see any entry on open access NTIRE page, is the…

25
r/MachineLearning community 14d ago

Contrastive targeted SFT as a mechinterp method - has anyone mapped causal dependency interactions this way? [D]

Hi All, I've been running experiments on targeted SFT for specific capability dimensions on a 31B model. After running small training run to prime the model slightly in the direction I want, then ran a judge across 40 domains scoring six independent quality dimensions. One…

21
r/MachineLearning community 14d ago

I deployed a GAN on a Raspberry Pi 4 and built a physical NFT minting device [P]

I trained a 128×128 DCGAN on my Macbook M3 and deployed it on a Raspberry Pi 4 connected to a LILYGO TTGO T-Display ESP32. The whole thing runs headlessly as a systemd service and generates hallucinated face hybrids at the press of a button. It is a 6-block generator (latent →…

20
r/MachineLearning community 14d ago

ACL 2026 first author with weak GPA. How should I approach PhD applications? [D]

Hi everyone, I have a fairly weak undergraduate: a 3.3/5 GPA in Computer Engineering from an average Nigerian university. For my Master's, I studied Artificial Intelligence at an average European university, where I finished with an 8/10 GPA. A condensed version of my Master's…

17
r/MachineLearning community 14d ago

ICML (DL4C) Accepted ( Few queries ) [D]

Just got the email that I have been accepted in DL4C @ICML 2026 , as the email did not contain any details on logistics can someone help here  - is it mandatory to visit the workshop ? - what's the usual expense apart from flights, can someone add details like fees and…

9
r/MachineLearning community 15d ago

Next-Latent Prediction Transformers [R]

Microsoft Research Preprint Next-token prediction is myopic. What if transformers learn to predict their own next latent state? Microsoft Research present Next-Latent Prediction (NextLat) : a self-supervised learning method that teaches transformers to form compact world models…

27
r/MachineLearning community 15d ago

What is Speculative Decoding? (trending on paperswithco.de) [R]

A method that is currently trending on Papers with Code is Speculative Decoding. https://preview.redd.it/dm4nh4t71o7h1.png?width=3082&format=png&auto=webp&s=b6468668667d4bcfb6c9248d3af7fd09f21fe0da Speculative decoding is an inference optimization technique that uses a fast,…

19
r/MachineLearning community 15d ago

Looking for a Quant Research / Development Partner for a Cross-Asset Regime Framework [d]

I'm working on a side project in systematic investing and market-state modeling. Over the last several months I've developed: An investment philosophy and alpha framework A quantitative model specification An engineering and implementation specification The project focuses on…

19
r/MachineLearning community 15d ago

Mel AI just shared a demo of video-native AI characters that can talk, react, and respond to camera context in real time [N]

Character AI, founded by former Google/LaMDA developers Noam Shazeer and Daniel De Freitas, proved that text-based character chat can work as a real entertainment category. But the next chapter might not be better text chat. It might be real-time video interaction. Mel AI…

32
r/MachineLearning community 15d ago

I built a leakage-clean verifier for robot manipulation, is this useful? Am I solving a non-problem? [D]

Spent the last few weeks on a benchmark/harness that tries to answer one question honestly: did a robot arm actually do the demonstrated task, or did the success metric just get fooled? The setup: compile a human demo into an object-centric graph (what changed in the world:…

7
r/MachineLearning community 15d ago

[ECCV 2026] Final Decisions [D]

ECCV 2026 final decisions are expected to be released on June 17, 2026 . Since there was no exact release time specified, results will likely roll out within 48 hours. This thread is for everyone to share updates, discuss outcomes, and support each other through the decisions.…

26
r/MachineLearning community 16d ago

Source code for LLMs. [D]

I was digging through Hugging Face’s Transformers repo and found https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt_oss/modeling_gpt_oss.py From what I can tell, this isn’t just boilerplate, it looks like a full implementation. is it actually the…

15
r/MachineLearning community 16d ago

quicktok: a faster tokenizer (exact and byte-identical with tiktoken) [P]

Been working on this a while! Should be useful for anyone trying to speed up their tokenization workflows. quicktok is a fast/exact BPE tokenizer written in C++. Token ids are byte-identical to tiktoken and encoding runs 2–3.6× faster than bpe-openai (the fastest alternative I…

25
r/MachineLearning community 16d ago

How the brains learn [R]

Abstract: A sufficient account of how the neocortex learns must meet three criteria: Computationally, it must approximate a powerful, general-purpose learning algorithm known to scale to human-level intelligence; Algorithmically, it must be implementable using known,…

9
r/MachineLearning community 16d ago

Cleo: trying to fit full analyst behavior in a 2B model [P]

Hello all! Half of all industrial "chatbots" are just text-to-SQL models in a trenchcoat (and the other half RAG!). I wanted to explore just how small you could make these models if you trained, evaluated, and ran inference in the exact same structured harness, leading to Cleo:…

4
r/MachineLearning community 16d ago

Embedded/edge ML folks: what actually eats the most time ,getting data, or cleaning/labeling it (time series sensor data, not computer vision/audio)? [D]

I'm trying to understand where people doing sensor based ML on microcontrollers (IMU, accelerometer, vibration ,that kind of time-series data) actually lose the most time. When you've built something like this, what was the bottleneck: Getting enough real world data in the first…

6

Best current methods for finetuning whisper on domain specific vocabulary? [P]

EMA on LoRA ? [R]

A slightly improved DVD-JEPA demo [P]

I released a softmax-free attention model at GPT-2 Medium scale (~354M params, 11.5B tokens): structural sparsity + tile-skipping kernels for long-context VRAM savings. Open weights + custom Triton kernels [R]

Looking for an ML/data collaborator — open to any project idea [p]

Python packages for particle swarms, genetic algorithms. Scikit-opt maybe? [D]

Studying FLUX in diffusers library was hard, so I built a smaller open-source version [P]

TSAuditor: A time-series auditing framework [P]

American businesses are using Chinese AI again? [N]

Hi Reddit, I posted my Build Your Own LLM workshop to Youtube teaching ML, LLM and math intuition [P]

Would you let an ML PhD student graduate without a top-tier paper? [D]

An open handbook on LLM inference at scale (GPU internals, KV cache, batching, vLLM/SGLang/TensorRT-LLM) [P]

DVD-JEPA: an open-source, fully-reproducible JEPA world model [P]

Time Series Modeling Needs a Dynamical Systems Perspective [R]

Built a Global AQ (PM2.5) Forecaster ML Model [P]

how to access books3 dataset for research purposes? [R]

Top notch best modern Probability or Statistics Books to get started with ML? [D]

Built a local ML pipeline that blocks risky commits before they leave your machine [P]

Dealing with a messy prescriptive monolith. How do you survive this? [D]

Best library for releasing my research optimization algorithm? [D]

How does torch.compile() achieve massive speedups despite highly optimized NumPy functions? [D]

Fearless Concurrency on the GPU: Safe GPU inference in Rust, competitive with vLLM/SGLang [R]

Neuron Populations Exhibit Divergent Selectivity with Scale [R]

Latent space interpretation [R]

Voice debugging at the conversation level seems far more useful than isolated benchmark metrics [D]

HELP WITH RESEARCH: Observation - Semantically Dense Context Produces Strong Late-Layer Divergence Without Jailbreak Prompts [D]

Is ACL now irrelevant? [D]

Any idea if AAAI will be harsh on computer vision paper as last year? [R]

How hard is it to break into ML work without a Master's degree? [D]

Multivariate Probability Models in Machine Learning [D]

What does provisional paper acceptance mean in ECCV? Is that the default message everyone gets? [D]

Should I accept job offer or do my master's? [D]

How do you analyze the relative "strength" of probes? [R]

Is foundational AI research still something that can be done without access to HPC? [D]

No CVPRW report [D]

Contrastive targeted SFT as a mechinterp method - has anyone mapped causal dependency interactions this way? [D]

I deployed a GAN on a Raspberry Pi 4 and built a physical NFT minting device [P]

ACL 2026 first author with weak GPA. How should I approach PhD applications? [D]

ICML (DL4C) Accepted ( Few queries ) [D]

Next-Latent Prediction Transformers [R]

What is Speculative Decoding? (trending on paperswithco.de) [R]

Looking for a Quant Research / Development Partner for a Cross-Asset Regime Framework [d]

Mel AI just shared a demo of video-native AI characters that can talk, react, and respond to camera context in real time [N]

I built a leakage-clean verifier for robot manipulation, is this useful? Am I solving a non-problem? [D]

[ECCV 2026] Final Decisions [D]

Source code for LLMs. [D]

quicktok: a faster tokenizer (exact and byte-identical with tiktoken) [P]

How the brains learn [R]

Cleo: trying to fit full analyst behavior in a 2B model [P]

Embedded/edge ML folks: what actually eats the most time ,getting data, or cleaning/labeling it (time series sensor data, not computer vision/audio)? [D]