News / #training Tag Training 451 articles archived under #training · RSS Sign in to follow arXiv — Machine Learning research 1mo ago Learning When to Adapt arXiv:2605.19028v1 Announce Type: new Abstract: Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method, yet its learned correction is static: the same low-rank update is applied to every input. This input-agnostic approach creates an inevitable… 38 arXiv — NLP / Computation & Language research 1mo ago Fine-tuning language encoding models on slow fMRI improves prediction for fast ECoG arXiv:2605.19224v1 Announce Type: new Abstract: Neuroscientists have recently turned to intracranial brain recording methods, like electrocorticography (ECoG), for human experiments because of the fine spatial and temporal resolution that they afford. Models trained on this… 11 arXiv — NLP / Computation & Language research 1mo ago EmbGen: Teaching with Reassembled Corpora arXiv:2605.19394v1 Announce Type: new Abstract: Adapting small instruction-tuned models to specialized domains often relies on supervised fine-tuning (SFT) on curated instruction-response examples, which is expensive to collect at scale. Synthetic training examples generated by… 29 Vercel — AI dev-tools 1mo ago Chat SDK now supports callback URLs on buttons and modals You can now pause a Workflow run on a Chat SDK card and resume it when someone clicks a button. The same flow works for form submissions. Buttons and modals accept a new callbackUrl prop, and the event payload is sent to that endpoint. To build a card like this, create a… 36 TechCrunch — AI news-outlet 1mo ago OpenAI co-founder Andrej Karpathy joins Anthropic’s pre-training team Andrej Karpathy has joined Anthropic to work on pre-training. He previously co-founded and worked at OpenAI and led computer vision and AI at Tesla. 27 arXiv — Machine Learning research 1mo ago Goal-Conditioned Supervised Learning for LLM Fine-Tuning arXiv:2605.16345v1 Announce Type: new Abstract: Large language models often require fine-tuning to better align their behavior with user intent at deployment. Existing approaches are commonly divided into online and offline paradigms. Online methods, such as RL-based alignment,… 28 arXiv — Machine Learning research 1mo ago Flow-Direct: Feedback-Efficient and Reusable Guidance for Flow Models via Non-Parametric Guidance Field arXiv:2605.16348v1 Announce Type: new Abstract: Training-free guidance enables pre-trained diffusion and flow models to optimize application-specific objectives using feedback from external black-box reward functions. However, existing methods are feedback-inefficient because… 24 arXiv — Machine Learning research 1mo ago LEAF: A Living Benchmark for Event-Augmented Forecasting arXiv:2605.16358v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly applied to forecasting. To evaluate this capability while mitigating pre-training data contamination, several living benchmarks have been proposed. However, existing benchmarks either… 33 arXiv — Machine Learning research 1mo ago Strategic Over-Parameterization for Generalizable Low-Rank Adaptation arXiv:2605.16470v1 Announce Type: new Abstract: Adapting large language models (LLMs) to downstream tasks via full fine-tuning is increasingly impractical due to its computational and memory demands. Parameter-efficient fine-tuning (PEFT) approaches such as Low-Rank Adaptation… 30 arXiv — Machine Learning research 1mo ago Scalable Knowledge Editing for Mixture-of-Experts LLMs via Tensor-Structured Updates arXiv:2605.16686v1 Announce Type: new Abstract: Knowledge editing (KE) provides a lightweight alternative to repeated fine-tuning of LLMs. However, most existing KE methods target dense feed-forward layers, while modern LLMs increasingly adopt Mixture-of-Experts (MoE)… 14 arXiv — Machine Learning research 1mo ago UB-SMoE: Universally Balanced Sparse Mixture-of-Experts for Resource-adaptive Federated Fine-tuning of Foundation Models arXiv:2605.16690v1 Announce Type: new Abstract: Heterogeneous LoRA-rank methods address system heterogeneity in federated fine-tuning of foundation models by assigning client-specific ranks based on computational capabilities. However, these methods achieve only marginal… 32 arXiv — NLP / Computation & Language research 1mo ago MixSD: Mixed Contextual Self-Distillation for Knowledge Injection arXiv:2605.16865v1 Announce Type: new Abstract: Supervised fine-tuning (SFT) is widely used to inject new knowledge into language models, but it often degrades pretrained capabilities such as reasoning and general-domain performance. We argue this forgetting arises because… 15 arXiv — NLP / Computation & Language research 1mo ago Evaluation Drift in LLM Personality Induction: Are We Moving the Goalpost? arXiv:2605.16996v1 Announce Type: new Abstract: Can large language models reliably express a human-like personality, or are they merely mimicking surface cues without a stable underlying profile? To investigate this, we induce personality in LLMs by fine-tuning them on the… 14 arXiv — NLP / Computation & Language research 1mo ago Weak-to-Strong Elicitation via Mismatched Wrong Drafts arXiv:2605.17314v1 Announce Type: new Abstract: We consider whether off-policy experience from a smaller, weaker model can elicit capability in a stronger learner that on-policy RL fine-tuning (e.g., GRPO) does not reach. We find that injecting mathematically wrong drafts from a… 25 arXiv — NLP / Computation & Language research 1mo ago Transitivity Meets Cyclicity: Explicit Preference Decomposition for Dynamic Large Language Model Alignment arXiv:2605.17342v1 Announce Type: new Abstract: Standard RLHF relies on transitive scalar rewards, failing to capture the cyclic nature of human preferences. While some approaches like the General Preference Model (GPM) address this, we identify a theoretical limitation: their… 11 arXiv — NLP / Computation & Language research 1mo ago Internalizing Tool Knowledge in Small Language Models via QLoRA Fine-Tuning arXiv:2605.17774v1 Announce Type: new Abstract: Large language models are increasingly used as planning components in agentic systems, but current tool-use pipelines often require full tool schemas to be included in every prompt, creating substantial token overhead and limiting… 24 arXiv — NLP / Computation & Language research 1mo ago A Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM$\Delta$ Integration into Upcycled MoE arXiv:2605.18083v1 Announce Type: new Abstract: Expanding Large Language Models~(LLMs) to new languages is a costly endeavor, demanding extensive Continued Pre-Training~(CPT) and data-intensive alignment. While recent data-free merging techniques attempt to bypass alignment by… 30 arXiv — NLP / Computation & Language research 1mo ago Ancient Greek to Modern Greek Machine Translation: A Novel Benchmark and Fine-Tuning Experiments on LLMs and NMT Models arXiv:2605.18504v1 Announce Type: new Abstract: Machine Translation (MT) for Ancient Greek (AG) to Modern Greek (MG) is a low-resource task, constrained by the lack of large-scale, high-quality parallel data. We address this gap by introducing the AG-MG Parallel Corpus, a new… 19 Hugging Face official-blog 1mo ago Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation Back to Articles Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation Enterprise + Article Published May 18, 2026 Upvote - Ting-Yun Chang ting-yunc nvidia Miguel Martin miguelmartin-nv nvidia Jonathan Allen nv-spectralflight nvidia Ke Ding kding1… 11 Hugging Face Daily Papers research 1mo ago Follow the Mean: Reference-Guided Flow Matching Abstract Flow matching enables controllable generation through example-based adaptation via conditional endpoint mean adjustment, offering training-free and parametric guidance methods for style and content control. AI-generated summary Existing approaches to controllable… 23 Hugging Face Daily Papers research 1mo ago Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models Abstract SAE-FT enables robust fine-tuning of vision-language models by regularizing visual representations through sparse autoencoder constraints, maintaining performance while improving robustness against distribution shifts. AI-generated summary Large-scale pre-trained… 34 arXiv — Machine Learning research 1mo ago TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination arXiv:2605.15207v1 Announce Type: new Abstract: Multi-agent LLM systems have shown promise for complex reasoning, yet recent evaluations reveal they often underperform single-model baselines. We identify a structural failure mode in sequential fine-tuning of shared-context… 29 arXiv — Machine Learning research 1mo ago Reducing the Safety Tax in LLM Safety Alignment with On-Policy Self-Distillation arXiv:2605.15239v1 Announce Type: new Abstract: Safety alignment often improves robustness to harmful queries at the cost of reasoning ability, a tradeoff known as the safety tax. A common cause is distributional mismatch: supervised fine-tuning trains the target model on safety… 18 arXiv — Machine Learning research 1mo ago Tadpole: Autoencoders as Foundation Models for 3D PDEs with Online Learning arXiv:2605.15284v1 Announce Type: new Abstract: We introduce Tadpole, a novel foundation model for three-dimensional partial differential equations (PDEs) that addresses key challenges in transferability, scalability to high dimensionality, and multi-functionality. Tadpole is… 12 arXiv — Machine Learning research 1mo ago Representation Without Reward: A JEPA Audit for LLM Fine-Tuning arXiv:2605.15394v1 Announce Type: new Abstract: Joint-embedding predictive architectures (JEPAs) propose that a model should learn more useful abstractions when trained to predict latent representations rather than observed outputs. For autoregressive language-model fine-tuning… 30 arXiv — Machine Learning research 1mo ago Towards Code-Oriented LM Embeddings for Surrogate-Assisted Neural Architecture Search arXiv:2605.15649v1 Announce Type: new Abstract: Developing effective surrogates (performance predictors) for Neural Architecture Search (NAS) typically requires expensive fine-tuning or the engineering of complex representations. We propose a low-cost embedding strategy that… 36 arXiv — Machine Learning research 1mo ago AOT-POT: Adaptive Operator Transformation for Large-Scale PDE Pre-training arXiv:2605.15793v1 Announce Type: new Abstract: Pre-training neural operators on diverse partial differential equation (PDE) datasets has emerged as a promising direction for building general-purpose surrogate models in scientific machine learning. However, the inherent… 27 arXiv — Machine Learning research 1mo ago CHoE: Cross-Domain Heterogeneous Graph Prompt Learning via Structure-Conditioned Experts arXiv:2605.15888v1 Announce Type: new Abstract: Heterogeneous Graph Prompt Learning (HGPL)has emerged as a promising paradigm for bridging the gap between the objectives of pre-training foundation models and their downstream applications in heterogeneous graph settings. However,… 31 arXiv — Machine Learning research 1mo ago LoCO: Low-rank Compositional Rotation Fine-tuning arXiv:2605.15916v1 Announce Type: new Abstract: Parameter-efficient fine-tuning (PEFT) has emerged as an critical technique for adapting large-scale foundation models across natural language processing and computer vision. While existing methods such as low-rank adaptations… 4 arXiv — NLP / Computation & Language research 1mo ago Toward LLMs Beyond English-Centric Development arXiv:2605.15613v1 Announce Type: new Abstract: Through an analysis of sequences generated by open-weight large language models (LLMs), we demonstrate that LLMs are heavily biased toward English. While continual pre-training is commonly used to adapt LLMs to a target language,… 19 arXiv — NLP / Computation & Language research 1mo ago Reference-Free Reinforcement Learning Fine-Tuning for MT: A Seq2Seq Perspective arXiv:2605.15976v1 Announce Type: new Abstract: Production machine translation relies overwhelmingly on encoder-decoder Seq2Seq models, yet reinforcement learning approaches to MT fine-tuning have largely targeted decoder-only LLMs at $\geq$7B parameters, with limited systematic… 17 arXiv — NLP / Computation & Language research 1mo ago From Feedback Loops to Policy Updates: Reinforcement Fine-Tuning for LLM-Based Alpha Factor Discovery arXiv:2605.15412v1 Announce Type: cross Abstract: Modern quantitative trading increasingly relies on systematic models to extract predictive signals from large-scale financial data, where alpha factor discovery plays a central role in transforming market observations into… 13 arXiv — NLP / Computation & Language research 1mo ago Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training arXiv:2506.01732v3 Announce Type: replace Abstract: Large Language Models (LLMs) are pre-trained on large amounts of data from different sources and domains. Such datasets often contain trillions of tokens, including large portions of copyrighted or proprietary content, which… 11 r/LocalLLaMA community 1mo ago Gemma-4-Gembrain-31B-it-uncensored-heretic Is Out Now, a Merge of Multiple Gemma 4 31B it Finetunes Designed to Boost Logical and Lateral Thinking for Improved Adherence, Increased Swipe Variety and Enhanced Creative Prose, With KLD of 0.0186 and 13/100 Refusals! Provided in both Safetensors and GGUFs. Safetensors: llmfan46/Gemma-4-Gembrain-31B-it-uncensored-heretic: https://huggingface.co/llmfan46/Gemma-4-Gembrain-31B-it-uncensored-heretic GGUFs: llmfan46/Gemma-4-Gembrain-31B-it-uncensored-heretic-GGUF:… 19 r/LocalLLaMA community 1mo ago G4-Meromero-31B-Uncensored-Heretic Is Out Now, a Finetune of Gemma 4 31B It Designed for Creative Tasks, With Kld of 0.0100 and 15/100 Refusals! Provided in both Safetensors and GGUFs. Safetensors: llmfan46/G4-MeroMero-31B-uncensored-heretic: https://huggingface.co/llmfan46/G4-MeroMero-31B-uncensored-heretic GGUFs: llmfan46/G4-MeroMero-31B-uncensored-heretic-GGUF:… 29 r/LocalLLaMA community 1mo ago gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic is Out Now, A Writing Finetune that Aims to Improve Gemma 4 31B it Writing Quality with More Natural English and Better Prose, Good for Creative Writings, Translations and RPs! Provided in both Safetensors and GGUFs. llmfan46/gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic: https://huggingface.co/llmfan46/gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic… 38 r/LocalLLaMA community 1mo ago LLM Phone Home: Reliable Apps that can deliver inference from local backend Hello all, I’m wondering what suggestions there are for an ios app that can serve an openai compatible endpoint. I am using 3sparks which works GREAT for that specific use, BUT, there is no mcp, no web search, etc. I want to show people that a local model with web search on your… 25 r/LocalLLaMA community 1mo ago Best dataset for model pre-training Well, alright, i want ~100M parameters . on a NVIDIA L4 (24GB VRAM) . any good dataset (and quanity of tokens ) to pretrain ?   submitted by   /u/Ok-Type-7663 [link]   [comments] 15 Hugging Face Daily Papers research 1mo ago Long Context Pre-Training with Lighthouse Attention Abstract Lighthouse Attention enables efficient training of causal transformers at long sequences by using hierarchical selection-based attention that reduces computational complexity while maintaining model performance. AI-generated summary Training causal transformers at… 33 Hugging Face Daily Papers research 1mo ago Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance Abstract FEST is a few-shot demonstration-guided reinforcement learning algorithm that achieves strong performance with minimal supervised fine-tuning data by combining supervised signals, on-policy learning, and weighted training to prevent overfitting. AI-generated summary… 22 r/LocalLLaMA community 1mo ago [FOUNDING] SupraLabs - real open-source AI models for you! https://preview.redd.it/k6lub2ypva1h1.png?width=1500&format=png&auto=webp&s=cd44452c86b5216fec17113a72f43bbf169edafb Hey r/LocalLLaMA ! We founded SupraLabs , and it's huge! What we do? We train, finetune and explore small models with good results to revolutionize small AI… 30 Hugging Face Daily Papers research 1mo ago Dynamic Latent Routing Abstract Temporal composition of sub-policies in MDPs with time-varying rewards enables optimal policy recovery through generalized Dijkstra search, which inspires a dynamic latent routing method for language model fine-tuning that outperforms traditional supervised approaches.… 34 arXiv — Machine Learning research 1mo ago Towards the Next Frontier of LLMs, Training on Private Data: A Cross-Domain Benchmark for Federated Fine-Tuning arXiv:2605.13936v1 Announce Type: new Abstract: The recent success of large language models (LLMs) has been largely driven by vast public datasets. However, the next frontier for LLM development lies beyond public data. Much of the world's most valuable information is private,… 33 arXiv — Machine Learning research 1mo ago ROAD: Adaptive Data Mixing for Offline-to-Online Reinforcement Learning via Bi-Level Optimization arXiv:2605.14497v1 Announce Type: new Abstract: Offline-to-online reinforcement learning harnesses the stability of offline pretraining and the flexibility of online fine-tuning. A key challenge lies in the non-stationary distribution shift between offline datasets and the… 34 arXiv — NLP / Computation & Language research 1mo ago PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts arXiv:2605.14055v1 Announce Type: new Abstract: Parameter-Efficient Fine-Tuning (PEFT) is widely used for adapting Large Language Models (LLMs) for various tasks. Recently, there has been an increasing demand for fine-tuning a single LLM for multiple tasks because it requires… 4 arXiv — NLP / Computation & Language research 1mo ago To See is Not to Learn: Protecting Multimodal Data from Unauthorized Fine-Tuning of Large Vision-Language Model arXiv:2605.14291v1 Announce Type: cross Abstract: The rapid advancement of Large Vision-Language Models (LVLMs) is increasingly accompanied by unauthorized scraping and training on multimodal web data, posing severe copyright and privacy risks to data owners. Existing… 8 Hugging Face Daily Papers research 1mo ago Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty? Abstract Current multimodal models struggle to match human expert aesthetic judgment in comparative image selection tasks, as demonstrated by the Visual Aesthetic Benchmark which reveals significant performance gaps and shows that fine-tuning on expert examples can improve… 14 Vercel — AI dev-tools 1mo ago Trace any Vercel request from the CLI You can now generate Session Traces through the Vercel CLI. Use the new vercel curl --trace command to generate an OpenTelemetry trace to the specified endpoint from the terminal. Use the new vercel traces get command to fetch the generated trace by request ID. Available on all… 38 r/LocalLLaMA community 1mo ago Developing open source LLM from ground up from pretrain - rlhf(PPO/GRPO) Hello I have been working on creating a LLM from ground up. It is based on deepseek architecture with heavily VRAM footprint reduced optimized(GUM+muon) Currently this is the json schema I am using which should suffice as to what currently is being pretrained. Training on a… 7 r/LocalLLaMA community 1mo ago Dropping learning rate fixed my Qlora fine-tune more than anything else i tried Been fine-tuning llama 3.1 8b with Qlora for a classification task using about 8k samples. I was getting bad eval results for a while and kept thinking something was wrong with my data. Tried cleaning the dataset, tried different prompt templates, messed with rank and alpha.… 35 Page 8 of 10 · 451 articles ← Newer Older →