Tag

Training

450 articles archived under #training · RSS

arXiv — Machine Learning research 28d ago

RL Excursions during Pre-Training: Re-examining Policy Optimization for LLM training

arXiv:2606.04272v1 Announce Type: new Abstract: The standard LLM training pipeline applies reinforcement learning (RL) only after pre-training and supervised fine-tuning (SFT). We question this status quo by training a LLM from scratch and applying RL, SFT, and SFT followed by…

28
arXiv — NLP / Computation & Language research 28d ago

Sparse Mixture-of-Experts Reward Models Learn Interpretable and Specialized Experts for Personalized Preference Modeling

arXiv:2606.04284v1 Announce Type: cross Abstract: Preference modeling plays a central role in reinforcement learning from human feedback (RLHF), enabling large language models (LLMs) to align with human values. However, most existing approaches assume a universal reward…

28
arXiv — Machine Learning research 28d ago

OpenRFM: Dissecting Relational In-Context Learning

arXiv:2606.04320v1 Announce Type: new Abstract: Relational Foundation Models (RFMs) promise a single pre-trained predictor that, given any relational database, returns predictions in one forward pass via relational in-context learning (ICL). Yet a substantial gap separates open…

19
arXiv — Machine Learning research 28d ago

(Mis)generalization of Helpful-only Fine-tuning

arXiv:2606.04413v1 Announce Type: new Abstract: Helpful-only models, that is, models that are trained to always follow user intent, are valuable for dangerous capability evaluations and other areas of AI R&D where refusals would be an obstacle. Little is known about the…

34
arXiv — NLP / Computation & Language research 28d ago

Long Live Fine-Tuning: Task-Specific Transformers Outperform Zero-Shot LLMs for Misinformation Response Classification on Reddit

arXiv:2606.04274v1 Announce Type: new Abstract: As large language models (LLMs) become default tools for online information verification, an implicit assumption follows them: that scale and general capability are sufficient for nuanced classification of misinformation discourse.…

30
arXiv — NLP / Computation & Language research 28d ago

Parameter-Efficient Fine-Tuning with Learnable Rank

arXiv:2606.04325v1 Announce Type: new Abstract: Low-Rank Adaptation (LoRA) is a popular parameter-efficient fine-tuning (PEFT) method that restricts weight updates to low-rank adapters, introducing a fixed low-rank inductive bias by optimizing in a low-dimensional subspace. In…

16
arXiv — NLP / Computation & Language research 28d ago

StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis

arXiv:2606.04246v1 Announce Type: cross Abstract: Automatic generation of RTL code for digital hardware designs remains challenging due to long-horizon reasoning, multi-step dependencies, and strict correctness constraints in Verilog and VHDL. We present StepPRM-RTL, a novel…

8
r/LocalLLaMA community 28d ago

The first Gemma 4 12B finetunes are ready

Now you can start building your Gemma 4 12B collection :) https://huggingface.co/igorls/gemma-4-12B-it-heretic-GGUF https://huggingface.co/ReadyArt/Melody1437-12B-v0.4-GGUF https://huggingface.co/DuoNeural/Gemma4-12B-IT-Abliterated-GGUF…

26
r/LocalLLaMA community 28d ago

gemma-4-12b-it vs Qwen3.5-9B on shared benchmarks: Qwen is overall winner beating gemma in 5/8 benchmarks despite a smaller footprint

I don't really understand the gemma hype. Qwen outperforms gemma gb for gb, and kv cache is lighter. Sure gemma-4-12b-it might be a slight better coder than Qwen3.5-9b, but you could also just use omnicoder-9b (Qwen3.5-9b finetune for coding). Note: Benchmark results come from…

19
r/LocalLLaMA community 28d ago

google/gemma-4-12B · Hugging Face

Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on E2B, E4B, and 12B) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned…

29
Hugging Face Daily Papers research 29d ago

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

Abstract Humanoid-GPT is a GPT-style Transformer with causal attention trained on a billion-scale motion corpus that achieves zero-shot generalization to unseen motions and control tasks through scalable pre-training on diverse motion data. Generated by…

29
r/LocalLLaMA community 29d ago

Holo3.1 35B/9B/4B/0.8B (Qwen 3.5 finetunes)

from Hcompany (which seems to be a French company): Holo3.1: Fast & Local Computer Use Agents Model Description Holo3.1 is our latest family of Vision-Language Models (VLMs) for computer use agents. Building on Holo3, it expands support beyond browser and desktop automation to…

25
Hugging Face Daily Papers research 29d ago

Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces

Abstract Answer-correct long chain-of-thought traces can lead to different fine-tuning outcomes, with post-conclusion continuations identified as harmful to training, characterized by uncertainty-geometry mismatches and addressed through a lightweight boundary proxy method.…

26
arXiv — Machine Learning research 29d ago

Pruning Deep Neural Networks via the Marchenko--Pastur Distribution

arXiv:2606.02608v1 Announce Type: new Abstract: We study a Marchenko--Pastur (MP) random-matrix approach to pruning deep neural networks with very small post-pruning fine-tuning budgets. The main practical contribution is accuracy retention under short calibration and…

34
arXiv — Machine Learning research 29d ago

GRZO: Group-Relative Zeroth-Order Optimization for Large Language Model Fine-Tuning

arXiv:2606.02857v1 Announce Type: new Abstract: Zeroth-order (ZO) optimization is a memory-efficient alternative to backpropagation for fine-tuning large language models, but its deployment is limited by the high variance of gradient estimation. We propose GRZO, a Group-Relative…

22
arXiv — Machine Learning research 29d ago

BYORn: Bootstrap Your Own Responses to Defend Large Vision-Language Models Against Backdoor Attacks

arXiv:2606.02947v1 Announce Type: new Abstract: Supervised fine-tuning is the predominant approach for adapting autoregressive vision-language models to downstream tasks. Recent work has shown that this paradigm is highly vulnerable to backdoor attacks, and that existing…

17
arXiv — Machine Learning research 29d ago

CoughSense: Five-Class Respiratory Disease Classification via Whisper Encoder Fine-Tuning and Dual-Encoder Cross-Attention Fusion with Balanced Contrastive Learning

arXiv:2606.02998v1 Announce Type: new Abstract: Automated cough analysis offers a path to low-cost respiratory screening, but most existing work stops at binary COVID-19 detection. A practical tool needs to tell apart several respiratory conditions from one cough recording on a…

4
arXiv — Machine Learning research 29d ago

DECA: Decentralizing Block-Wise Adam for Efficient LLM Full-Parameter Fine-Tuning on Non-IID Data

arXiv:2606.03209v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) in privacy-sensitive and resource-constrained environments remains challenging. Since training data are often distributed across multiple clients, decentralized fine-tuning offers a natural…

15
arXiv — Machine Learning research 29d ago

When RLHF Fails: A Mechanistic Taxonomy of Reward Hacking, Collapse, and Evaluator Gaming

arXiv:2606.03238v1 Announce Type: new Abstract: Reinforcement learning from human feedback (RLHF) makes large-scale post-training possible by replacing an underspecified human objective with learned and scalable proxies. The same substitution creates a structured failure…

12
arXiv — Machine Learning research 29d ago

Message Tuning Outshines Graph Prompt Tuning: A Prismatic Space Perspective

arXiv:2606.03290v1 Announce Type: new Abstract: Graph Foundation Models (GFMs), built upon the Pre-training and Adaptation paradigm, have emerged as a research hotspot in graph learning. For GNN-based GFMs, graph prompt tuning has become the prevailing adaptation method for…

4
arXiv — NLP / Computation & Language research 29d ago

Regret Pre-training: Bridging Prior and Posterior Views for Enhanced Knowledge Grounding

arXiv:2606.03080v1 Announce Type: new Abstract: Causal language models factorize sequence probabilities using only preceding context, leaving future information unexploited during training despite its availability in the training data. This paper introduces Regret Pre-training,…

31
arXiv — NLP / Computation & Language research 29d ago

The Word and the Way: Strategies for Domain-Specific BERT Pre-Training in German Medical NLP

arXiv:2606.03250v1 Announce Type: new Abstract: Digital healthcare generates vast amounts of clinical text that can support AI-assisted applications, yet German biomedical language models remain limited by older architectures or restricted training data. We present ChristBERT…

33
arXiv — NLP / Computation & Language research 29d ago

From Script to Semantics: Prompting Strategies for African NLI

arXiv:2606.03304v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly evaluated in multilingual settings, yet their inference behavior in low-resource African languages remains underexplored especially under pure prompting without fine-tuning. We present…

38
arXiv — NLP / Computation & Language research 29d ago

Large Language Models Are Overconfident in Their Own Responses

arXiv:2606.03437v1 Announce Type: new Abstract: Prior work has shown that instruction-tuned large language models (LLMs) are less well calibrated than their base pre-trained counterparts. However, little is known about the frequently used chat template's effect on the…

10
arXiv — NLP / Computation & Language research 29d ago

AutoTail-BSFGM: Class-Balance-Aware Fine-Tuning for Chinese Scholarly Text Classification

arXiv:2606.03576v1 Announce Type: new Abstract: Scholarly text classification supports literature organization, subject indexing, and research intelligence, but Chinese scholarly corpora often contain imbalanced and semantically adjacent disciplinary labels. We propose…

5
arXiv — NLP / Computation & Language research 29d ago

Safety Measurements for Fine-tuned LLMs Should be Grounded in Capability

arXiv:2606.03648v1 Announce Type: new Abstract: Adapting foundation large language models to a user's task or preferred style through fine-tuning can result in compromising the model's safety. Previous works examined the effects of fine-tuning on model safety in limited and…

32
r/LocalLLaMA community 29d ago

Remember around 2023-2024 when we did partys (wizardlm, nous capybara and dolphin) and finetunes?

Yes, I remember it. It was peak. Now those models get outpeformed by 2026-era models. I want to revive this era I miss it so bad 😞   submitted by   /u/Ok-Type-7663 [link]   [comments]

29
llama.cpp releases dev-tools 1mo ago

b9468

server: real-time reasoning interruption via control endpoint ( #23971 ) server: real-time reasoning interruption via control endpoint Builds on the manual reasoning budget trigger from #23949 . Adds a CONTROL task that mirrors the CANCEL path on the live slot and calls…

17
arXiv — Machine Learning research 1mo ago

From Demonstrations to Rewards: Test-Time Prompt Optimization for VLM Reward Models

arXiv:2606.00083v1 Announce Type: new Abstract: Reinforcement learning relies on accurate reward functions, which are often hand-crafted or even unavailable in real-world applications, such as robotics. Recent work has explored the zero-shot reasoning capabilities of pre-trained…

11
arXiv — Machine Learning research 1mo ago

RAFT: Data Refinement and Adaptive Distillation for Domain Fine-Tuning with Alleviated Forgetting

arXiv:2606.00147v1 Announce Type: new Abstract: Domain-specific supervised fine-tuning (SFT) often improves in-domain performance at the cost of degrading a model's general capabilities. We view this degradation through two practical gaps in domain SFT: a…

10
arXiv — Machine Learning research 1mo ago

A Pre-Training Analogue of Grokking in Language Models: Tracing Delayed Grammatical Generalization

arXiv:2606.00230v1 Announce Type: new Abstract: Grokking, the phenomenon in which neural networks generalize long after fitting their training data, has been studied in supervised settings on many epochs. LLM pre-training instead involves next-token prediction over an unlabeled…

25
arXiv — Machine Learning research 1mo ago

ARCA: Adapter-Residual Credit Assignment When Token Signals Degenerate

arXiv:2606.00257v1 Announce Type: new Abstract: Token-level credit assignment for language-model reinforcement learning is usually formulated as if the policy were fully trainable, while practical LLM-RL pipelines often rely on parameter-efficient fine-tuning, especially LoRA.…

9
arXiv — Machine Learning research 1mo ago

CRMA: A Spectrally-Bounded Backbone for Modular Continual Fine-Tuning of LLMs

arXiv:2606.00382v1 Announce Type: new Abstract: Sequential fine-tuning of large language models forces a choice: let the shared substrate keep learning and accept catastrophic forgetting, or freeze it after task one and foreclose cross-task refinement. Per-task adapter methods…

23
arXiv — Machine Learning research 1mo ago

Escaping the Mode Lottery: Multi-Response Training Improves Language Model Generalization

arXiv:2606.00544v1 Announce Type: new Abstract: Modern language-model fine-tuning typically pairs each prompt with a single response, even though many prompts admit multiple valid completions. This effectively reduces a multi-modal conditional distribution to a one-sample view,…

13
arXiv — NLP / Computation & Language research 1mo ago

LinguIUTics at PsyDefDetect: Iterative Imbalance-Aware Fine-tuning of Qwen3-8B for Psychological Defense Mechanism Classification

arXiv:2606.00647v1 Announce Type: new Abstract: Detecting psychological defense mechanisms in conversational text remains a challenging clinical NLP problem. For the PsyDefDetect 2026 shared task (nine-class utterance classification evaluated via macro F1), our team LinguIUTics…

5
Hugging Face Daily Papers research 1mo ago

LongAttnComp: Cross-Family Context Compression for Long-Context Reasoning

Abstract LongAttnComp adapts AttnComp for long-context processing by fine-tuning lightweight attention layers and implementing token-level chunking and positional reordering techniques. AI-generated summary As real-world applications increasingly require processing inputs of…

27
Hugging Face Daily Papers research 1mo ago

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

Abstract Parameter-efficient fine-tuning can function as a compact substrate for persistent personal models by enabling small trainable adapters to store instance-specific behaviors on top of strong foundation models. AI-generated summary Parameter-efficient fine-tuning (PEFT)…

21
Hugging Face Daily Papers research 1mo ago

Draft-OPD: On-Policy Distillation for Speculative Draft Models

Abstract Speculative decoding uses a lightweight draft model to accelerate large language model inference, but supervised fine-tuning plateaus due to offline-to-inference mismatch, which is addressed through on-policy distillation with target-assisted rollouts and error replay.…

29
Hugging Face Daily Papers research 1mo ago

Compositional Text-to-Image Generation Via Region-aware Bimodal Direct Preference Optimization

Abstract BiDPO enhances text-to-image models for complex compositional prompts through preference-based fine-tuning and region-level guidance. AI-generated summary Despite the rapid progress of text-to-image (T2I) models, generating images that accurately reflect complex…

18
Hugging Face Daily Papers research 1mo ago

NITP: Next Implicit Token Prediction for LLM Pre-training

Abstract Next Implicit Token Prediction enhances language model training by adding dense continuous supervision in representation space, improving generalization and performance across model sizes with minimal computational overhead. AI-generated summary Standard next-token…

34
r/MachineLearning community 1mo ago

[P] Free AI Agent Security Assessment [P]

Hey everyone, We’re building Antitech , a security layer for AI agents and LLM-powered workflows. We’re opening a small number of free early-access assessments for teams/builders working on AI agents. If you give us access to an endpoint of a Dockerized / sandboxed environment…

8
Hugging Face Daily Papers research 1mo ago

DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization

Abstract DRIFT is a framework that combines offline trajectories with importance-weighted supervised fine-tuning to achieve multi-turn interactive learning efficiency and performance comparable to reinforcement learning. AI-generated summary Large language models are…

38
Hugging Face Daily Papers research 1mo ago

The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement

Abstract SAVE framework improves reward model training by using value functions to grade on-policy responses and update models through contrastive objectives. AI-generated summary Building strong reward models (RMs) for language model alignment is bottlenecked by the cost and…

26
arXiv — Machine Learning research 1mo ago

The Long-Term Effects of Data Selection in LLM Fine-Tuning

arXiv:2605.30537v1 Announce Type: new Abstract: Data selection is increasingly used to reduce the cost of large language model (LLM) fine-tuning, with recent methods prioritizing samples by current utility, diversity, quality, or influence. This paper studies a different…

16
arXiv — Machine Learning research 1mo ago

CSULoRA: Closest Safe Update Low-Rank Adaptation

arXiv:2605.30640v1 Announce Type: new Abstract: Low-rank adaptation has become a standard method for parameter-efficient fine-tuning of large language models, but even small amounts of unsafe or adversarial fine-tuning data can substantially weaken the safety behavior of aligned…

28
arXiv — Machine Learning research 1mo ago

SemStruct: Contextualizing Semantic Embeddings with Structural Information for Schema Matching

arXiv:2605.30729v1 Announce Type: new Abstract: Schema matching is a fundamental step in integrating heterogeneous data sources. While Pre-trained Language Models (PLMs) have revolutionized this task by capturing linguistic semantics, they typically process tabular data as…

35
arXiv — Machine Learning research 1mo ago

Efficient and Uncertainty-Aware Diffusion Framework for Offline-to-Online Reinforcement Learning

arXiv:2605.30776v1 Announce Type: new Abstract: Offline-to-Online Reinforcement Learning (O2O-RL) leverages an offline, pre-trained policy to minimize costly online interactions. Although data-efficient, O2O-RL is susceptible to shifts between offline and online distributions.…

8
arXiv — NLP / Computation & Language research 1mo ago

Fine-Tuning Improves Information Conveyance in Language Models

arXiv:2605.30844v1 Announce Type: new Abstract: Fine-tuning is often believed to reduce uncertainty and diversity in large language models, but existing analyses overlook output length, a key confounder, and therefore fail to capture how uncertainty is distributed across an…

27
arXiv — NLP / Computation & Language research 1mo ago

MADS: Model-Aware Diverse Core Set Selection for Instruction Tuning

arXiv:2605.30857v1 Announce Type: new Abstract: Instruction fine-tuning is employed to enhance the instruction-following ability of large language models (LLMs). As the amount of instruction fine-tuning data increases, selecting the optimal core set becomes particularly…

16
arXiv — NLP / Computation & Language research 1mo ago

The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement

arXiv:2605.30888v1 Announce Type: new Abstract: Building strong reward models (RMs) for language model alignment is bottlenecked by the cost and difficulty of acquiring diverse and reliable preference data from human annotation or judge models. It is dramatically worse as the…

31

RL Excursions during Pre-Training: Re-examining Policy Optimization for LLM training

Sparse Mixture-of-Experts Reward Models Learn Interpretable and Specialized Experts for Personalized Preference Modeling

OpenRFM: Dissecting Relational In-Context Learning

(Mis)generalization of Helpful-only Fine-tuning

Long Live Fine-Tuning: Task-Specific Transformers Outperform Zero-Shot LLMs for Misinformation Response Classification on Reddit

Parameter-Efficient Fine-Tuning with Learnable Rank

StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis

The first Gemma 4 12B finetunes are ready

gemma-4-12b-it vs Qwen3.5-9B on shared benchmarks: Qwen is overall winner beating gemma in 5/8 benchmarks despite a smaller footprint

google/gemma-4-12B · Hugging Face

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

Holo3.1 35B/9B/4B/0.8B (Qwen 3.5 finetunes)

Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces

Pruning Deep Neural Networks via the Marchenko--Pastur Distribution

GRZO: Group-Relative Zeroth-Order Optimization for Large Language Model Fine-Tuning

BYORn: Bootstrap Your Own Responses to Defend Large Vision-Language Models Against Backdoor Attacks

CoughSense: Five-Class Respiratory Disease Classification via Whisper Encoder Fine-Tuning and Dual-Encoder Cross-Attention Fusion with Balanced Contrastive Learning

DECA: Decentralizing Block-Wise Adam for Efficient LLM Full-Parameter Fine-Tuning on Non-IID Data

When RLHF Fails: A Mechanistic Taxonomy of Reward Hacking, Collapse, and Evaluator Gaming

Message Tuning Outshines Graph Prompt Tuning: A Prismatic Space Perspective

Regret Pre-training: Bridging Prior and Posterior Views for Enhanced Knowledge Grounding

The Word and the Way: Strategies for Domain-Specific BERT Pre-Training in German Medical NLP

From Script to Semantics: Prompting Strategies for African NLI

Large Language Models Are Overconfident in Their Own Responses

AutoTail-BSFGM: Class-Balance-Aware Fine-Tuning for Chinese Scholarly Text Classification

Safety Measurements for Fine-tuned LLMs Should be Grounded in Capability

Remember around 2023-2024 when we did partys (wizardlm, nous capybara and dolphin) and finetunes?

b9468

From Demonstrations to Rewards: Test-Time Prompt Optimization for VLM Reward Models

RAFT: Data Refinement and Adaptive Distillation for Domain Fine-Tuning with Alleviated Forgetting

A Pre-Training Analogue of Grokking in Language Models: Tracing Delayed Grammatical Generalization

ARCA: Adapter-Residual Credit Assignment When Token Signals Degenerate

CRMA: A Spectrally-Bounded Backbone for Modular Continual Fine-Tuning of LLMs

Escaping the Mode Lottery: Multi-Response Training Improves Language Model Generalization

LinguIUTics at PsyDefDetect: Iterative Imbalance-Aware Fine-tuning of Qwen3-8B for Psychological Defense Mechanism Classification

LongAttnComp: Cross-Family Context Compression for Long-Context Reasoning

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

Draft-OPD: On-Policy Distillation for Speculative Draft Models

Compositional Text-to-Image Generation Via Region-aware Bimodal Direct Preference Optimization

NITP: Next Implicit Token Prediction for LLM Pre-training

[P] Free AI Agent Security Assessment [P]

DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization

The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement

The Long-Term Effects of Data Selection in LLM Fine-Tuning

CSULoRA: Closest Safe Update Low-Rank Adaptation

SemStruct: Contextualizing Semantic Embeddings with Structural Information for Schema Matching

Efficient and Uncertainty-Aware Diffusion Framework for Offline-to-Online Reinforcement Learning

Fine-Tuning Improves Information Conveyance in Language Models

MADS: Model-Aware Diverse Core Set Selection for Instruction Tuning

The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement