r/MachineLearning

500 articles archived · Visit source ↗ · RSS

r/MachineLearning community 1mo ago

First-time ICML workshop acceptance (GlobalSouthML) but can't afford to travel to South Korea. What are my options? [D]

Hey everyone, I’m an undergrad from India and I just found out I had two papers accepted at the ICML 2026 GlobalSouthML workshop! I am super excited since this is my first time getting accepted into a major conference venue, but I’m also kind of panicking right now because I…

8
r/MachineLearning community 1mo ago

Need reliable source for 30+ years of S&P 500 historical data for LSTM/Transformer research [P]

Hi everyone, I'm starting a research project on financial time-series forecasting using LSTM and Transformer models for predicting S&P 500 market direction. Right now, I'm struggling with obtaining reliable long-term historical data. I tried Yahoo Finance, but downloads are…

8
r/MachineLearning community 1mo ago

We built a tool that installs frameworks like ComfyUI, Ollama, OpenWebUI etc on any cloud GPU in one command and saves your whole setup between sessions [R]

We kept running into the same problem every time we rented a GPU to run Ollama + OpenWebUI or ComfyUI, we'd spend the first 45 minutes reinstalling everything. Custom nodes, models, configs, all of it. Docker images went stale fast, different providers had different base images,…

16
r/MachineLearning community 1mo ago

Architecture advice: Real-time pipeline for YouTube Audio -> Whisper -> LLM -> SSE (Sub-10s latency) [D]

Hey everyone, I’m building a backend that analyzes long YouTube videos using an LLM. Currently, my flow is a slow waterfall: Download full audio -> Whisper -> LLM -> Return results . For a 30-minute video, the user waits forever. I want to pipeline this for real-time SSE…

5
r/MachineLearning community 1mo ago

Released a free 9.8M doc Indic multilingual corpus — Hindi, Bengali, Tamil, Telugu + 7 more (CC0, HuggingFace) [P]

Built this over the past few weeks as part of a multilingual research project. Figured I'd share it here. Check it out! ~9.8M web documents across 11 languages — hi, bn, ta, te, mr, gu, kn, ml, pa, ur, en. ~8.4B tokens. CC0 license. 🤗…

21
r/MachineLearning community 1mo ago

MLRC 2026 is open for submissions - an official track at NeurIPS 2026 [N]

The annual Machine Learning Reproducibility Challenge (MLRC) 2026 is now open for submissions. This year, it is held as an official track at NeurIPS 2026 - submissions, once accepted through TMLR, will be eligible to be presented at the conference in Sydney, Australia this…

20
r/MachineLearning community 1mo ago

THIS IS VERY ANNOYING. Why are my agents misbehaving? [D]

As you can tell, I am a human so no I am NOT going to sit around and wait for my agents to behave properly. I am not their dad, instead I'm their creator. When I tell an agent to perform or behave a certain way, I would obviously expect it to do so. Now I understand…

28
r/MachineLearning community 1mo ago

No new paper under review in TMLR since May 09? [D]

Why is that? Link: https://openreview.net/group?id=TMLR&referrer=%5BHomepage%5D(%2F)#tab-under-review-submissions #tab-under-review-submissions) It seems no action editor assignments are happening for over a week now.   submitted by   /u/hyperactve [link]   [comments]

28
r/MachineLearning community 1mo ago

Rewriting model inference with CUDA kernels: the bottleneck was not just GEMM [P]

I’ve been working on a CUDA-first inference runtime for small-batch / realtime ML workloads. The core idea is simple: instead of treating PyTorch / TensorRT / generic graph runtimes as the main execution path, I rewrite the model inference path directly with C++/CUDA kernels.…

13
r/MachineLearning community 1mo ago

Has anyone received decisions for the ICML 2026 GlobalSouthML workshop yet? [D]

Hey everyone! The decision notification deadline for the GlobalSouthML workshop was originally May 15th (and the site updated it to May 17th AoE), but my OpenReview dashboard still just says "0 Official Reviews Submitted" I know workshop timelines can be a bit chaotic and delays…

14
r/MachineLearning community 1mo ago

Witchcraft, fast local semantic search on top of SQLite [P]

Witchcraft ( https://github.com/dropbox/witchcraft ) , an open source project that I built at Dropbox, is a from-scratch re-implementation of Stanford's XTR-Warp semantic search engine ( https://github.com/jlscheerer/xtr-warp ) in safe rust, using a single-file SQLite database…

32
r/MachineLearning community 1mo ago

AI/ML Ethicists [D]

So I’ve been working with AI/ML for the past couple of years, and it has been an amazing experience. I still remember using GPT-2 for the first time and being completely blown away by it. Seeing how far the technology has come since then is honestly mind-blowing. I genuinely…

15
r/MachineLearning community 1mo ago

Is the future of coding agents JEPA? [D]

I heard Yann LeCun explain JEPA (Joint Embedding Predictive Architecture) recently and I started thinking about using it for coding agents. Most coding agents today work by throwing a huge amount of text into a frontier LLM and asking it to generate the next patch. That is…

29
r/MachineLearning community 1mo ago

Will wait listed ones be mailed regardless? Eeml 26 [D]

They said We aim to inform you by May 18th if a place becomes available Does that mean no reply if not accepted? I so wish I could be there   submitted by   /u/Active-Tip3130 [link]   [comments]

29
r/MachineLearning community 1mo ago

Can't post anything on Reddit [D]

Appearently, everyone can self promote, link their github and post any kind of thing while EVERY SINGLE POST I TRY TO MAKE GETS REMOVED WITH EXCUSES. My recent post was a complain about Minimax and got removed becasue "self-promoting" while on top of this group there was a post…

11
r/MachineLearning community 1mo ago

Sub-JEPA: a simple fix to LeCun group's LeWorldModel that consistently improves performance [P]

World models learn compact latent representations for planning without pixel reconstruction. LeWorldModel (LeWM), from LeCun's group at NYU, achieves stable end-to-end JEPA training by enforcing an isotropic Gaussian prior over the full latent space. The flaw: real environment…

12
r/MachineLearning community 1mo ago

Reviving PapersWithCode (by Hugging Face) [P]

Hi, Niels here from the open-source team at Hugging Face. Like many others, I was a huge fan of paperswithcode. Sadly, that website is no longer maintained after its acquisition by Meta. Hence, I've been working on reviving it. I obviously use AI agents to parse papers at scale…

10
r/MachineLearning community 1mo ago

Scaling LLMs horizontally: hidden-state coupling without weight modification [R]

Residual Coupling (RC) connects frozen language models in parallel using small, learned linear bridge projections. These bridges read hidden states from one model and inject additive updates into the residual stream of another at intermediate layers. In bilateral setups,…

17
r/MachineLearning community 1mo ago

The 1/√d_k scaling in attention isn't Numerical Stability: Here's the actual math and why it breaks without it [D]

Every resource says " We scale by 1/√d_k to prevent softmax saturation ." Almost none of them explain why saturation happens or why that specific scaling constant appears. When you compute Q·Kᵀ without scaling, each element is a dot product of two d_k-dimensional vectors. If the…

21
r/MachineLearning community 1mo ago

ICML financial aid [D]

I am an undergraduate student from India who recently got accepted to TAIGR, an ICML workshop for a Poster. I will be requiring financial aid for registration fees and accommodation, since I will be travelling to Seoul and it is independent research so we don't have any backing…

30
r/MachineLearning community 1mo ago

could refusal layers be masking dialect-conditioned safety failures in MoE models [d]

I set out to test whether AAVE-coded (African American English Vernacular) prompts cause MoE language models to route, deliberate, and respond differently from semantically matched AE (Academic English) prompts in safety-sensitive situations, especially when refusal behavior is…

35
r/MachineLearning community 1mo ago

model-agnostic sensitivity approximator [P]

(to preface, i'm 16 and this is the first package i've ever built. any feedback would be appreciated!) what i've noticed is that most industry-standard xai tools (think shap/lime) focus on feature attribution (why did the model made this prediction), but it doesn't do anything…

22
r/MachineLearning community 1mo ago

Would a new result in pre-print be considered by reviewers? [D]

So I have a bit of a weird question; suppose you were reviewing a paper. The paper is otherwise ok, but you notice that the authors left a giant elephant in the room unaddressed, either experiment wise or theoretical result wise. But then you become curious and you look up the…

17
r/MachineLearning community 1mo ago

How are you handling training data when public datasets don't match your use case? [D]

Public datasets on HF or Kaggle can sometimes be too generic, wrong domain, wrong schema, outdated, or just not enough volume to generalize properly. Collecting real-world proprietary data takes months. What do people actually do? From what I have seen, the options tend to be: -…

32
r/MachineLearning community 1mo ago

#1 on memory benchmark LongMemEval with Gemini Flash, not Pro [R]

Disclosure: first author. Evaluation of an experimental memory retrieval system against LongMemEval (Wang et al., 2024). Figured the results might be of interest here, particularly the deliberate use of a smaller answering model to isolate retrieval quality from model…

32
r/MachineLearning community 1mo ago

Slop is making me feel disconnected from AI Research [D]

Hello everyone. This is just a small rant on my part. I’m relatively young, a final year undergrad, and I’ve been interested in AI researcher since I was in high school. Over that period of time I feel there has been a significant shift in the landscape regarding the culture…

37
r/MachineLearning community 1mo ago

Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention [P]

  submitted by   /u/seraschka [link]   [comments]

34
r/MachineLearning community 1mo ago

Help in ML algos [D]

So see, I’ve learned ML algorithms theoretically, but practically I have little to no experience. So can you guys suggest some resources through which I can understand which algorithms work well on which kinds of datasets? How is everything done step by step?   submitted by…

25
r/MachineLearning community 1mo ago

DeepSeek Exposed: Users Can Access Each Other's Conversations with a Special Input[D]

A recent security report has revealed a critical privacy flaw in DeepSeek: simply entering a specific character in the input field can expose other users' conversations. This has raised serious concerns about the platform's session isolation and data security. The bigger…

4
r/MachineLearning community 1mo ago

ML lead vs PM on eval-methodology layer independence. who's actually right here? [D]

got into an argument with our ML lead at 11pm yesterday about an eval methodology a PM had built off a framework she learned at an AI PM cohort. shes claiming a layered defense framework, hes saying the layers are statistically conditioned and her independence claim is wrong.…

10
r/MachineLearning community 1mo ago

Help with CNNs.[D]

So, I’ve learned CNNs theoretically, but now I want to see how they behave practically , specifically on images: where they work well, where they fail, and how to improve their performance, etc. So, please suggest some resources or projects through which I can explore this…

21
r/MachineLearning community 1mo ago

Program misleading high school students into paying to perform academic misconduct in ML Research [D]

I was browsing OpenReview and I came accross this person called Kevin Zhu https://openreview.net/profile?id=~Kevin_Zhu3 , lets say I was impressed when I saw 158 publications and 468 coauthors, and out of curiosity I searched up his afflication ( https://algoverseairesearch.org/…

16
r/MachineLearning community 1mo ago

Anyone from India attending EEML ? [D]

I got accepted into EEML and I’m a little confused about travel and stay. Has anyone else from India been accepted? Let’s connect!   submitted by   /u/Suhan_XD [link]   [comments]

33
r/MachineLearning community 1mo ago

Made and Published a Paper Comparing Analysis of CNN and Vision Transformer Architectures for Brain Tumor Detection [R]

Hi everyone 😄 A while ago I worked on a project where I compared computer vision architectures on detecting and classifying brain tumors in brain MRI scans. I was looking for some feedback on the methodology and really anything else--just simple research stuff. This isn't meant…

8
r/MachineLearning community 1mo ago

Do you agree with Judea that learning from data is not everything? [D]

Link: Judea Pearl, 2011 ACM Turing Award Recipient (2:18:05) Quote: There is a limitation to that which people not everybody understand. I already mentioned a limitation that you have a hierarchy here and going from correlation to causation and from causation from causation to…

33
r/MachineLearning community 1mo ago

Backlash against Arxiv's proposed 1 year ban is genuinely perplexing. [D]

Anyone else surprised at the enormous amount of backlash against Arxiv's proposed 1 year ban for authors and coauthors publishing papers with hallucinated reference and other obvious LLM/Gen AI artifacts? https://x.com/tdietterich/status/2055000956144935055…

28
r/MachineLearning community 1mo ago

[R] Which LLMs are actually best for bleeding-edge Linux/ML debugging workflows in 2026? [R]

I’m trying to optimize an AI workflow for bleeding-edge Linux/ML debugging (Arch/CachyOS, CUDA, Python, unsloth, etc.). Current stack: - Claude = deep reasoning/mastermind - Gemini 3.1 Pro = execution/logistics - Perplexity = retrieval Main problem: Gemini often gives…

4
r/MachineLearning community 1mo ago

KDD 2026 Cycle 2 Results [D]

Results for the research track have been released.   submitted by   /u/ATadDisappointed [link]   [comments]

5
r/MachineLearning community 1mo ago

ROCm with PyTorch and PyTorch Lightning seems to still suck for research [D]

So I asked about people's experiences with ROCm in a post a few weeks or so ago https://www.reddit.com/r/MachineLearning/comments/1t6cng3/rocm_status_in_mid_2026_d/ I actually went and procured a RX 7900XTX reference version to give it a try My discovery is that it kind of still…

24
r/MachineLearning community 1mo ago

Doubts Urgent Guys![R]

For an expensive simulator inside an MCMC DA setup like this, do you see amortised inference (SBI / neural posterior estimation) as more transformative than surrogating the forward model, since it attacks the per-pixel MCMC bottleneck directly? A neural operator framing (FNO /…

29
r/MachineLearning community 1mo ago

Struggling with Overfitting on Medical Imaging Task [D]

Hi everyone, I’m working on a 2-class classification problem (LCA vs. RCA coronary arteries) using 2D X-ray angiograms. I’m currently stuck in a cycle of extreme overfitting and could use some advice on my training strategy. The Setup: Dataset: Small (~900 training frames from…

35
r/MachineLearning community 1mo ago

Notes from evaluating a customer support chat agent system: heuristic evaluators give false signal, retrieval bugs masquerade as LLM failures, and the cost/quality Pareto frontier is rarely where you think [D]

Posting some practical findings from a structured audit of a production customer support RAG system. Methodology and caveats up front. Methodology: 6 representative turns from a real production session as the eval set (small, acknowledged limitation) LLM-as-judge using Claude…

20
r/MachineLearning community 1mo ago

Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion [R]

Paper: https://arxiv.org/abs/2605.12825 Code: https://github.com/chiennv2000/orthrus Disclosure: co-author. Idea: Inject a trainable diffusion attention module into each layer of a frozen AR Transformer. Both heads share one KV cache. Diffusion head projects K=32 tokens in…

21
r/MachineLearning community 1mo ago

PINN is predicting trivial solution for stiff ODE [D]

I am learning physics informed neural networks. Currently, I am solving a simple second ODE (damped harmonic oscillator). The equation is m*d2y/dt2 + mu*dy/dt + k*y = 0 (bcs: y(t=0) = 1, y'(t=0) = 0). I managed to draft a code. The code works for k values upto 50. However, when…

6
r/MachineLearning community 1mo ago

Looking for a real world dataset (or website where i can find it) [P]

Hi guys, I’m gonna do a data analysis project based on data privacy, bias and data interpretability. For this reason our professor asked for a real world dataset in order to analyze a real case. Additionally I would prefer the least anonymity possible for that dataset in order…

38
r/MachineLearning community 1mo ago

software trying to catch software is officially a dead en [D]

I feel like we've crossed a weird threshold in the generative AI space where the arms race against botnets is just over. and the bots won I was reading that interview recently where the Reddit CEO was floating the idea of using Face ID and Touch ID just to verify that commenters…

15
r/MachineLearning community 1mo ago

[D] Position paper: using hallucination as a construction instrument to distill task-specific cognitive kernels from frontier models [D]

Background: I am a software developer, not an ML researcher. This started from a practical question — why do AI coding tools send proprietary client code to remote servers when the task only requires Swift? Following that question produced this framework. The core proposal…

8
r/MachineLearning community 1mo ago

Does anyone know any ready-to-go Emotion Cause Extraction (ECE) model? [R]

Hi everyone, I am currently looking for a Emotion Cause Extraction (ECE) model that is ready to go which means that I can download the model and run it immediately on text.   submitted by   /u/Mountain_Turnip_6403 [link]   [comments]

26
r/MachineLearning community 1mo ago

It is the process of rapidly ever improving differentiation between noise and signal patterns and constant generalization of those that produces intelligence, not merely compression of data. [D]

Until we can design a mathematical system with one unavoidable intrinsic goal that drives it with undeniable force and encode that to hardware, plug it into a simulator of raw data, and give it the initial faculties to form, store, manipulate and alter all patterns based on its…

23
r/MachineLearning community 1mo ago

How we catch silent NPU fallback on Snapdragon in CI [D]

Posting because I've now seen this exact bug at multiple teams shipping ML to Snapdragon, and the pattern is worth writing up. ONNX Runtime's QNN execution provider (the one that targets Qualcomm's Hexagon NPU on Snapdragon SoCs) will silently route unsupported ops to the CPU.…

31

First-time ICML workshop acceptance (GlobalSouthML) but can't afford to travel to South Korea. What are my options? [D]

Need reliable source for 30+ years of S&P 500 historical data for LSTM/Transformer research [P]

We built a tool that installs frameworks like ComfyUI, Ollama, OpenWebUI etc on any cloud GPU in one command and saves your whole setup between sessions [R]

Architecture advice: Real-time pipeline for YouTube Audio -> Whisper -> LLM -> SSE (Sub-10s latency) [D]

Released a free 9.8M doc Indic multilingual corpus — Hindi, Bengali, Tamil, Telugu + 7 more (CC0, HuggingFace) [P]

MLRC 2026 is open for submissions - an official track at NeurIPS 2026 [N]

THIS IS VERY ANNOYING. Why are my agents misbehaving? [D]

No new paper under review in TMLR since May 09? [D]

Rewriting model inference with CUDA kernels: the bottleneck was not just GEMM [P]

Has anyone received decisions for the ICML 2026 GlobalSouthML workshop yet? [D]

Witchcraft, fast local semantic search on top of SQLite [P]

AI/ML Ethicists [D]

Is the future of coding agents JEPA? [D]

Will wait listed ones be mailed regardless? Eeml 26 [D]

Can't post anything on Reddit [D]

Sub-JEPA: a simple fix to LeCun group's LeWorldModel that consistently improves performance [P]

Reviving PapersWithCode (by Hugging Face) [P]

Scaling LLMs horizontally: hidden-state coupling without weight modification [R]

The 1/√d_k scaling in attention isn't Numerical Stability: Here's the actual math and why it breaks without it [D]

ICML financial aid [D]

could refusal layers be masking dialect-conditioned safety failures in MoE models [d]

model-agnostic sensitivity approximator [P]

Would a new result in pre-print be considered by reviewers? [D]

How are you handling training data when public datasets don't match your use case? [D]

#1 on memory benchmark LongMemEval with Gemini Flash, not Pro [R]

Slop is making me feel disconnected from AI Research [D]

Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention [P]

Help in ML algos [D]

DeepSeek Exposed: Users Can Access Each Other's Conversations with a Special Input[D]

ML lead vs PM on eval-methodology layer independence. who's actually right here? [D]

Help with CNNs.[D]

Program misleading high school students into paying to perform academic misconduct in ML Research [D]

Anyone from India attending EEML ? [D]

Made and Published a Paper Comparing Analysis of CNN and Vision Transformer Architectures for Brain Tumor Detection [R]

Do you agree with Judea that learning from data is not everything? [D]

Backlash against Arxiv's proposed 1 year ban is genuinely perplexing. [D]

[R] Which LLMs are actually best for bleeding-edge Linux/ML debugging workflows in 2026? [R]

KDD 2026 Cycle 2 Results [D]

ROCm with PyTorch and PyTorch Lightning seems to still suck for research [D]

Doubts Urgent Guys![R]

Struggling with Overfitting on Medical Imaging Task [D]

Notes from evaluating a customer support chat agent system: heuristic evaluators give false signal, retrieval bugs masquerade as LLM failures, and the cost/quality Pareto frontier is rarely where you think [D]

Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion [R]

PINN is predicting trivial solution for stiff ODE [D]

Looking for a real world dataset (or website where i can find it) [P]

software trying to catch software is officially a dead en [D]

[D] Position paper: using hallucination as a construction instrument to distill task-specific cognitive kernels from frontier models [D]

Does anyone know any ready-to-go Emotion Cause Extraction (ECE) model? [R]

It is the process of rapidly ever improving differentiation between noise and signal patterns and constant generalization of those that produces intelligence, not merely compression of data. [D]

How we catch silent NPU fallback on Snapdragon in CI [D]