Tag

Training

451 articles archived under #training · RSS

arXiv — Machine Learning research 1mo ago

Learning When to Adapt

arXiv:2605.19028v1 Announce Type: new Abstract: Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method, yet its learned correction is static: the same low-rank update is applied to every input. This input-agnostic approach creates an inevitable…

38
arXiv — NLP / Computation & Language research 1mo ago

Fine-tuning language encoding models on slow fMRI improves prediction for fast ECoG

arXiv:2605.19224v1 Announce Type: new Abstract: Neuroscientists have recently turned to intracranial brain recording methods, like electrocorticography (ECoG), for human experiments because of the fine spatial and temporal resolution that they afford. Models trained on this…

11
arXiv — NLP / Computation & Language research 1mo ago

EmbGen: Teaching with Reassembled Corpora

arXiv:2605.19394v1 Announce Type: new Abstract: Adapting small instruction-tuned models to specialized domains often relies on supervised fine-tuning (SFT) on curated instruction-response examples, which is expensive to collect at scale. Synthetic training examples generated by…

29
Vercel — AI dev-tools 1mo ago

Chat SDK now supports callback URLs on buttons and modals

You can now pause a Workflow run on a Chat SDK card and resume it when someone clicks a button. The same flow works for form submissions. Buttons and modals accept a new callbackUrl prop, and the event payload is sent to that endpoint. To build a card like this, create a…

36
TechCrunch — AI news-outlet 1mo ago

OpenAI co-founder Andrej Karpathy joins Anthropic’s pre-training team

Andrej Karpathy has joined Anthropic to work on pre-training. He previously co-founded and worked at OpenAI and led computer vision and AI at Tesla.

27
arXiv — Machine Learning research 1mo ago

Goal-Conditioned Supervised Learning for LLM Fine-Tuning

arXiv:2605.16345v1 Announce Type: new Abstract: Large language models often require fine-tuning to better align their behavior with user intent at deployment. Existing approaches are commonly divided into online and offline paradigms. Online methods, such as RL-based alignment,…

28
arXiv — Machine Learning research 1mo ago

Flow-Direct: Feedback-Efficient and Reusable Guidance for Flow Models via Non-Parametric Guidance Field

arXiv:2605.16348v1 Announce Type: new Abstract: Training-free guidance enables pre-trained diffusion and flow models to optimize application-specific objectives using feedback from external black-box reward functions. However, existing methods are feedback-inefficient because…

24
arXiv — Machine Learning research 1mo ago

LEAF: A Living Benchmark for Event-Augmented Forecasting

arXiv:2605.16358v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly applied to forecasting. To evaluate this capability while mitigating pre-training data contamination, several living benchmarks have been proposed. However, existing benchmarks either…

33
arXiv — Machine Learning research 1mo ago

Strategic Over-Parameterization for Generalizable Low-Rank Adaptation

arXiv:2605.16470v1 Announce Type: new Abstract: Adapting large language models (LLMs) to downstream tasks via full fine-tuning is increasingly impractical due to its computational and memory demands. Parameter-efficient fine-tuning (PEFT) approaches such as Low-Rank Adaptation…

30
arXiv — Machine Learning research 1mo ago

Scalable Knowledge Editing for Mixture-of-Experts LLMs via Tensor-Structured Updates

arXiv:2605.16686v1 Announce Type: new Abstract: Knowledge editing (KE) provides a lightweight alternative to repeated fine-tuning of LLMs. However, most existing KE methods target dense feed-forward layers, while modern LLMs increasingly adopt Mixture-of-Experts (MoE)…

14
arXiv — Machine Learning research 1mo ago

UB-SMoE: Universally Balanced Sparse Mixture-of-Experts for Resource-adaptive Federated Fine-tuning of Foundation Models

arXiv:2605.16690v1 Announce Type: new Abstract: Heterogeneous LoRA-rank methods address system heterogeneity in federated fine-tuning of foundation models by assigning client-specific ranks based on computational capabilities. However, these methods achieve only marginal…

32
arXiv — NLP / Computation & Language research 1mo ago

MixSD: Mixed Contextual Self-Distillation for Knowledge Injection

arXiv:2605.16865v1 Announce Type: new Abstract: Supervised fine-tuning (SFT) is widely used to inject new knowledge into language models, but it often degrades pretrained capabilities such as reasoning and general-domain performance. We argue this forgetting arises because…

15
arXiv — NLP / Computation & Language research 1mo ago

Evaluation Drift in LLM Personality Induction: Are We Moving the Goalpost?

arXiv:2605.16996v1 Announce Type: new Abstract: Can large language models reliably express a human-like personality, or are they merely mimicking surface cues without a stable underlying profile? To investigate this, we induce personality in LLMs by fine-tuning them on the…

14
arXiv — NLP / Computation & Language research 1mo ago

Weak-to-Strong Elicitation via Mismatched Wrong Drafts

arXiv:2605.17314v1 Announce Type: new Abstract: We consider whether off-policy experience from a smaller, weaker model can elicit capability in a stronger learner that on-policy RL fine-tuning (e.g., GRPO) does not reach. We find that injecting mathematically wrong drafts from a…

25
arXiv — NLP / Computation & Language research 1mo ago

Transitivity Meets Cyclicity: Explicit Preference Decomposition for Dynamic Large Language Model Alignment

arXiv:2605.17342v1 Announce Type: new Abstract: Standard RLHF relies on transitive scalar rewards, failing to capture the cyclic nature of human preferences. While some approaches like the General Preference Model (GPM) address this, we identify a theoretical limitation: their…

11
arXiv — NLP / Computation & Language research 1mo ago

Internalizing Tool Knowledge in Small Language Models via QLoRA Fine-Tuning

arXiv:2605.17774v1 Announce Type: new Abstract: Large language models are increasingly used as planning components in agentic systems, but current tool-use pipelines often require full tool schemas to be included in every prompt, creating substantial token overhead and limiting…

24
arXiv — NLP / Computation & Language research 1mo ago

A Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM$\Delta$ Integration into Upcycled MoE

arXiv:2605.18083v1 Announce Type: new Abstract: Expanding Large Language Models~(LLMs) to new languages is a costly endeavor, demanding extensive Continued Pre-Training~(CPT) and data-intensive alignment. While recent data-free merging techniques attempt to bypass alignment by…

30
arXiv — NLP / Computation & Language research 1mo ago

Ancient Greek to Modern Greek Machine Translation: A Novel Benchmark and Fine-Tuning Experiments on LLMs and NMT Models

arXiv:2605.18504v1 Announce Type: new Abstract: Machine Translation (MT) for Ancient Greek (AG) to Modern Greek (MG) is a low-resource task, constrained by the lack of large-scale, high-quality parallel data. We address this gap by introducing the AG-MG Parallel Corpus, a new…

19
Hugging Face official-blog 1mo ago

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

Back to Articles Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation Enterprise + Article Published May 18, 2026 Upvote - Ting-Yun Chang ting-yunc nvidia Miguel Martin miguelmartin-nv nvidia Jonathan Allen nv-spectralflight nvidia Ke Ding kding1…

11
Hugging Face Daily Papers research 1mo ago

Follow the Mean: Reference-Guided Flow Matching

Abstract Flow matching enables controllable generation through example-based adaptation via conditional endpoint mean adjustment, offering training-free and parametric guidance methods for style and content control. AI-generated summary Existing approaches to controllable…

23
Hugging Face Daily Papers research 1mo ago

Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models

Abstract SAE-FT enables robust fine-tuning of vision-language models by regularizing visual representations through sparse autoencoder constraints, maintaining performance while improving robustness against distribution shifts. AI-generated summary Large-scale pre-trained…

34
arXiv — Machine Learning research 1mo ago

TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination

arXiv:2605.15207v1 Announce Type: new Abstract: Multi-agent LLM systems have shown promise for complex reasoning, yet recent evaluations reveal they often underperform single-model baselines. We identify a structural failure mode in sequential fine-tuning of shared-context…

29
arXiv — Machine Learning research 1mo ago

Reducing the Safety Tax in LLM Safety Alignment with On-Policy Self-Distillation

arXiv:2605.15239v1 Announce Type: new Abstract: Safety alignment often improves robustness to harmful queries at the cost of reasoning ability, a tradeoff known as the safety tax. A common cause is distributional mismatch: supervised fine-tuning trains the target model on safety…

18
arXiv — Machine Learning research 1mo ago

Tadpole: Autoencoders as Foundation Models for 3D PDEs with Online Learning

arXiv:2605.15284v1 Announce Type: new Abstract: We introduce Tadpole, a novel foundation model for three-dimensional partial differential equations (PDEs) that addresses key challenges in transferability, scalability to high dimensionality, and multi-functionality. Tadpole is…

12
arXiv — Machine Learning research 1mo ago

Representation Without Reward: A JEPA Audit for LLM Fine-Tuning

arXiv:2605.15394v1 Announce Type: new Abstract: Joint-embedding predictive architectures (JEPAs) propose that a model should learn more useful abstractions when trained to predict latent representations rather than observed outputs. For autoregressive language-model fine-tuning…

30
arXiv — Machine Learning research 1mo ago

Towards Code-Oriented LM Embeddings for Surrogate-Assisted Neural Architecture Search

arXiv:2605.15649v1 Announce Type: new Abstract: Developing effective surrogates (performance predictors) for Neural Architecture Search (NAS) typically requires expensive fine-tuning or the engineering of complex representations. We propose a low-cost embedding strategy that…

36
arXiv — Machine Learning research 1mo ago

AOT-POT: Adaptive Operator Transformation for Large-Scale PDE Pre-training

arXiv:2605.15793v1 Announce Type: new Abstract: Pre-training neural operators on diverse partial differential equation (PDE) datasets has emerged as a promising direction for building general-purpose surrogate models in scientific machine learning. However, the inherent…

27
arXiv — Machine Learning research 1mo ago

CHoE: Cross-Domain Heterogeneous Graph Prompt Learning via Structure-Conditioned Experts

arXiv:2605.15888v1 Announce Type: new Abstract: Heterogeneous Graph Prompt Learning (HGPL)has emerged as a promising paradigm for bridging the gap between the objectives of pre-training foundation models and their downstream applications in heterogeneous graph settings. However,…

31
arXiv — Machine Learning research 1mo ago

LoCO: Low-rank Compositional Rotation Fine-tuning

arXiv:2605.15916v1 Announce Type: new Abstract: Parameter-efficient fine-tuning (PEFT) has emerged as an critical technique for adapting large-scale foundation models across natural language processing and computer vision. While existing methods such as low-rank adaptations…

4
arXiv — NLP / Computation & Language research 1mo ago

Toward LLMs Beyond English-Centric Development

arXiv:2605.15613v1 Announce Type: new Abstract: Through an analysis of sequences generated by open-weight large language models (LLMs), we demonstrate that LLMs are heavily biased toward English. While continual pre-training is commonly used to adapt LLMs to a target language,…

19
arXiv — NLP / Computation & Language research 1mo ago

Reference-Free Reinforcement Learning Fine-Tuning for MT: A Seq2Seq Perspective

arXiv:2605.15976v1 Announce Type: new Abstract: Production machine translation relies overwhelmingly on encoder-decoder Seq2Seq models, yet reinforcement learning approaches to MT fine-tuning have largely targeted decoder-only LLMs at $\geq$7B parameters, with limited systematic…

17
arXiv — NLP / Computation & Language research 1mo ago

From Feedback Loops to Policy Updates: Reinforcement Fine-Tuning for LLM-Based Alpha Factor Discovery

arXiv:2605.15412v1 Announce Type: cross Abstract: Modern quantitative trading increasingly relies on systematic models to extract predictive signals from large-scale financial data, where alpha factor discovery plays a central role in transforming market observations into…

13
arXiv — NLP / Computation & Language research 1mo ago

Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training

arXiv:2506.01732v3 Announce Type: replace Abstract: Large Language Models (LLMs) are pre-trained on large amounts of data from different sources and domains. Such datasets often contain trillions of tokens, including large portions of copyrighted or proprietary content, which…

11
r/LocalLLaMA community 1mo ago

Gemma-4-Gembrain-31B-it-uncensored-heretic Is Out Now, a Merge of Multiple Gemma 4 31B it Finetunes Designed to Boost Logical and Lateral Thinking for Improved Adherence, Increased Swipe Variety and Enhanced Creative Prose, With KLD of 0.0186 and 13/100 Refusals!

Provided in both Safetensors and GGUFs. Safetensors: llmfan46/Gemma-4-Gembrain-31B-it-uncensored-heretic: https://huggingface.co/llmfan46/Gemma-4-Gembrain-31B-it-uncensored-heretic GGUFs: llmfan46/Gemma-4-Gembrain-31B-it-uncensored-heretic-GGUF:…

19
r/LocalLLaMA community 1mo ago

G4-Meromero-31B-Uncensored-Heretic Is Out Now, a Finetune of Gemma 4 31B It Designed for Creative Tasks, With Kld of 0.0100 and 15/100 Refusals!

Provided in both Safetensors and GGUFs. Safetensors: llmfan46/G4-MeroMero-31B-uncensored-heretic: https://huggingface.co/llmfan46/G4-MeroMero-31B-uncensored-heretic GGUFs: llmfan46/G4-MeroMero-31B-uncensored-heretic-GGUF:…

29
r/LocalLLaMA community 1mo ago

gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic is Out Now, A Writing Finetune that Aims to Improve Gemma 4 31B it Writing Quality with More Natural English and Better Prose, Good for Creative Writings, Translations and RPs!

Provided in both Safetensors and GGUFs. llmfan46/gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic: https://huggingface.co/llmfan46/gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic…

38
r/LocalLLaMA community 1mo ago

LLM Phone Home: Reliable Apps that can deliver inference from local backend

Hello all, I’m wondering what suggestions there are for an ios app that can serve an openai compatible endpoint. I am using 3sparks which works GREAT for that specific use, BUT, there is no mcp, no web search, etc. I want to show people that a local model with web search on your…

25
r/LocalLLaMA community 1mo ago

Best dataset for model pre-training

Well, alright, i want ~100M parameters . on a NVIDIA L4 (24GB VRAM) . any good dataset (and quanity of tokens ) to pretrain ?   submitted by   /u/Ok-Type-7663 [link]   [comments]

15
Hugging Face Daily Papers research 1mo ago

Long Context Pre-Training with Lighthouse Attention

Abstract Lighthouse Attention enables efficient training of causal transformers at long sequences by using hierarchical selection-based attention that reduces computational complexity while maintaining model performance. AI-generated summary Training causal transformers at…

33
Hugging Face Daily Papers research 1mo ago

Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance

Abstract FEST is a few-shot demonstration-guided reinforcement learning algorithm that achieves strong performance with minimal supervised fine-tuning data by combining supervised signals, on-policy learning, and weighted training to prevent overfitting. AI-generated summary…

22
r/LocalLLaMA community 1mo ago

[FOUNDING] SupraLabs - real open-source AI models for you!

https://preview.redd.it/k6lub2ypva1h1.png?width=1500&format=png&auto=webp&s=cd44452c86b5216fec17113a72f43bbf169edafb Hey r/LocalLLaMA ! We founded SupraLabs , and it's huge! What we do? We train, finetune and explore small models with good results to revolutionize small AI…

30
Hugging Face Daily Papers research 1mo ago

Dynamic Latent Routing

Abstract Temporal composition of sub-policies in MDPs with time-varying rewards enables optimal policy recovery through generalized Dijkstra search, which inspires a dynamic latent routing method for language model fine-tuning that outperforms traditional supervised approaches.…

34
arXiv — Machine Learning research 1mo ago

Towards the Next Frontier of LLMs, Training on Private Data: A Cross-Domain Benchmark for Federated Fine-Tuning

arXiv:2605.13936v1 Announce Type: new Abstract: The recent success of large language models (LLMs) has been largely driven by vast public datasets. However, the next frontier for LLM development lies beyond public data. Much of the world's most valuable information is private,…

33
arXiv — Machine Learning research 1mo ago

ROAD: Adaptive Data Mixing for Offline-to-Online Reinforcement Learning via Bi-Level Optimization

arXiv:2605.14497v1 Announce Type: new Abstract: Offline-to-online reinforcement learning harnesses the stability of offline pretraining and the flexibility of online fine-tuning. A key challenge lies in the non-stationary distribution shift between offline datasets and the…

34
arXiv — NLP / Computation & Language research 1mo ago

PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts

arXiv:2605.14055v1 Announce Type: new Abstract: Parameter-Efficient Fine-Tuning (PEFT) is widely used for adapting Large Language Models (LLMs) for various tasks. Recently, there has been an increasing demand for fine-tuning a single LLM for multiple tasks because it requires…

4
arXiv — NLP / Computation & Language research 1mo ago

To See is Not to Learn: Protecting Multimodal Data from Unauthorized Fine-Tuning of Large Vision-Language Model

arXiv:2605.14291v1 Announce Type: cross Abstract: The rapid advancement of Large Vision-Language Models (LVLMs) is increasingly accompanied by unauthorized scraping and training on multimodal web data, posing severe copyright and privacy risks to data owners. Existing…

8
Hugging Face Daily Papers research 1mo ago

Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?

Abstract Current multimodal models struggle to match human expert aesthetic judgment in comparative image selection tasks, as demonstrated by the Visual Aesthetic Benchmark which reveals significant performance gaps and shows that fine-tuning on expert examples can improve…

14
Vercel — AI dev-tools 1mo ago

Trace any Vercel request from the CLI

You can now generate Session Traces through the Vercel CLI. Use the new vercel curl --trace command to generate an OpenTelemetry trace to the specified endpoint from the terminal. Use the new vercel traces get command to fetch the generated trace by request ID. Available on all…

38
r/LocalLLaMA community 1mo ago

Developing open source LLM from ground up from pretrain - rlhf(PPO/GRPO)

Hello I have been working on creating a LLM from ground up. It is based on deepseek architecture with heavily VRAM footprint reduced optimized(GUM+muon) Currently this is the json schema I am using which should suffice as to what currently is being pretrained. Training on a…

7
r/LocalLLaMA community 1mo ago

Dropping learning rate fixed my Qlora fine-tune more than anything else i tried

Been fine-tuning llama 3.1 8b with Qlora for a classification task using about 8k samples. I was getting bad eval results for a while and kept thinking something was wrong with my data. Tried cleaning the dataset, tried different prompt templates, messed with rank and alpha.…

35

Learning When to Adapt

Fine-tuning language encoding models on slow fMRI improves prediction for fast ECoG

EmbGen: Teaching with Reassembled Corpora

Chat SDK now supports callback URLs on buttons and modals

OpenAI co-founder Andrej Karpathy joins Anthropic&#8217;s pre-training team

Goal-Conditioned Supervised Learning for LLM Fine-Tuning

Flow-Direct: Feedback-Efficient and Reusable Guidance for Flow Models via Non-Parametric Guidance Field

LEAF: A Living Benchmark for Event-Augmented Forecasting

Strategic Over-Parameterization for Generalizable Low-Rank Adaptation

Scalable Knowledge Editing for Mixture-of-Experts LLMs via Tensor-Structured Updates

UB-SMoE: Universally Balanced Sparse Mixture-of-Experts for Resource-adaptive Federated Fine-tuning of Foundation Models

MixSD: Mixed Contextual Self-Distillation for Knowledge Injection

Evaluation Drift in LLM Personality Induction: Are We Moving the Goalpost?

Weak-to-Strong Elicitation via Mismatched Wrong Drafts

Transitivity Meets Cyclicity: Explicit Preference Decomposition for Dynamic Large Language Model Alignment

Internalizing Tool Knowledge in Small Language Models via QLoRA Fine-Tuning

A Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM$\Delta$ Integration into Upcycled MoE

Ancient Greek to Modern Greek Machine Translation: A Novel Benchmark and Fine-Tuning Experiments on LLMs and NMT Models

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

Follow the Mean: Reference-Guided Flow Matching

Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models

TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination

Reducing the Safety Tax in LLM Safety Alignment with On-Policy Self-Distillation

Tadpole: Autoencoders as Foundation Models for 3D PDEs with Online Learning

Representation Without Reward: A JEPA Audit for LLM Fine-Tuning

Towards Code-Oriented LM Embeddings for Surrogate-Assisted Neural Architecture Search

AOT-POT: Adaptive Operator Transformation for Large-Scale PDE Pre-training

CHoE: Cross-Domain Heterogeneous Graph Prompt Learning via Structure-Conditioned Experts

LoCO: Low-rank Compositional Rotation Fine-tuning

Toward LLMs Beyond English-Centric Development

Reference-Free Reinforcement Learning Fine-Tuning for MT: A Seq2Seq Perspective

From Feedback Loops to Policy Updates: Reinforcement Fine-Tuning for LLM-Based Alpha Factor Discovery

Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training

Gemma-4-Gembrain-31B-it-uncensored-heretic Is Out Now, a Merge of Multiple Gemma 4 31B it Finetunes Designed to Boost Logical and Lateral Thinking for Improved Adherence, Increased Swipe Variety and Enhanced Creative Prose, With KLD of 0.0186 and 13/100 Refusals!

G4-Meromero-31B-Uncensored-Heretic Is Out Now, a Finetune of Gemma 4 31B It Designed for Creative Tasks, With Kld of 0.0100 and 15/100 Refusals!

gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic is Out Now, A Writing Finetune that Aims to Improve Gemma 4 31B it Writing Quality with More Natural English and Better Prose, Good for Creative Writings, Translations and RPs!

LLM Phone Home: Reliable Apps that can deliver inference from local backend

Best dataset for model pre-training

Long Context Pre-Training with Lighthouse Attention

Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance

[FOUNDING] SupraLabs - real open-source AI models for you!

Dynamic Latent Routing

Towards the Next Frontier of LLMs, Training on Private Data: A Cross-Domain Benchmark for Federated Fine-Tuning

ROAD: Adaptive Data Mixing for Offline-to-Online Reinforcement Learning via Bi-Level Optimization

PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts

To See is Not to Learn: Protecting Multimodal Data from Unauthorized Fine-Tuning of Large Vision-Language Model

Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?

Trace any Vercel request from the CLI

Developing open source LLM from ground up from pretrain - rlhf(PPO/GRPO)

Dropping learning rate fixed my Qlora fine-tune more than anything else i tried

OpenAI co-founder Andrej Karpathy joins Anthropic’s pre-training team