Tag

Training

450 articles archived under #training · RSS

arXiv — Machine Learning research 14d ago

Sparsity Curse: Understanding RLVR Model Parameter Space from Model Merging

arXiv:2606.18521v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Reward (RLVR) has emerged as a powerful post-training paradigm that surpasses Supervised Fine-Tuning (SFT) in eliciting reasoning intelligence and resisting catastrophic forgetting. Recent…

13
arXiv — Machine Learning research 14d ago

Robust and Interpretable Adaptation of Equivariant Materials Foundation Models via Sparsity-promoting Fine-tuning

arXiv:2606.18691v1 Announce Type: new Abstract: Pre-trained materials foundation models, or machine learning interatomic potentials, leverage general physicochemical knowledge to effectively approximate potential energy surfaces. However, they often require domain-specific…

10
arXiv — Machine Learning research 14d ago

FoMoE: Breaking the Full-Replica Barrier with a Federation of MoEs

arXiv:2606.19025v1 Announce Type: new Abstract: Pre-training Large Language Models (LLMs) typically demands large-scale infrastructure with tightly coupled hardware accelerators. While increasing model and dataset scale remains the dominant driver of performance,…

9
Hugging Face official-blog 14d ago

Beyond LoRA: Can you beat the most popular fine-tuning technique?

Back to Articles a]:hidden"> Beyond LoRA: Can you beat the most popular fine-tuning technique? Published June 18, 2026 Update on GitHub Upvote 6 Benjamin Bossan BenjaminB Sayak Paul sayakpaul Marian hubnemo Kashif Rasul kashif When you plan to fine-tune a model in a…

16
llama.cpp releases dev-tools 14d ago

b9688

server: (router) add model management API ( #23976 ) wip server: (router) add SSE realtime updates API nits wip add download API add download api update docs add delete endpoint fix std::terminate fix crash fix 2 add tests nits macOS/iOS: macOS Apple Silicon (arm64) macOS Apple…

17
arXiv — Machine Learning research 15d ago

A Risk Decomposition Framework for Pre-Hoc Fine-Tuning Prediction

arXiv:2606.17649v1 Announce Type: new Abstract: The high cost of fine-tuning LLMs poses a significant economic barrier; pre-hoc performance prediction offers a critical solution to substantially reduce this expense. However, the theoretical limits of pre-hoc performance…

11
arXiv — Machine Learning research 15d ago

TuneAhead: Predicting Fine-tuning Performance Before Full Training Begins

arXiv:2606.17660v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) is compute-intensive and error-prone: model performance depends sensitively on data quality and hyperparameter choices, and na\"ive runs can even degrade model performance. This raises a…

21
arXiv — Machine Learning research 15d ago

Handling Feature Heterogeneity with Learnable Graph Patches

arXiv:2606.17667v1 Announce Type: new Abstract: In recent years, the rapid development of foundation models and graph pre-training technologies has spurred increasing interest in constructing a universal pre-trained graph model or Graph Foundation Model (GFM). However, a…

34
arXiv — Machine Learning research 15d ago

From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning

arXiv:2606.18089v1 Announce Type: new Abstract: Post-training pipelines that combine supervised fine-tuning (SFT) with reinforcement learning (RL) have emerged as the key recipe for transforming large language models (LLMs) into robust reasoners. We argue that this combined…

17
arXiv — NLP / Computation & Language research 15d ago

RepSelect: Robust LLM Unlearning via Representation Selectivity

arXiv:2606.17168v1 Announce Type: new Abstract: Making large language models (LLMs) deeply forget specific knowledge and values without sacrificing general capabilities remains a central challenge in unlearning. However, current methods are easily reversed by fine-tuning or…

29
arXiv — NLP / Computation & Language research 15d ago

Improving low-resource ASR using bilingual fine-tuning with language identification: a cross-linguistic evaluation

arXiv:2606.17820v1 Announce Type: new Abstract: This study explores how bilingual fine-tuning affects automatic speech recognition (ASR) in low-resource languages. We evaluate this method across nine linguistically and geographically diverse language pairs, covering a range of…

28
arXiv — NLP / Computation & Language research 15d ago

Learning task-specific subspaces via interventional post-training of speech foundation models

arXiv:2606.17967v1 Announce Type: new Abstract: Speech foundation models, pre-trained on large corpora of unlabelled speech data, produce general-purpose representations which are useful across tasks. However, these representations encode information about salient speech…

5
arXiv — NLP / Computation & Language research 15d ago

Fine-tuning LLMs for Passive Depression Severity Estimation from AI Mental Health Dialogue

arXiv:2606.17973v1 Announce Type: new Abstract: Depression is the leading cause of disability worldwide, and early detection of symptom change is essential for timely intervention. Validated instruments such as the Patient Health Questionnaire-9 (PHQ-9) support symptom…

17
arXiv — NLP / Computation & Language research 15d ago

When English Isn't the Best Teacher: Source Language Effects in Cross-Lingual In-Context Learning

arXiv:2606.18033v1 Announce Type: new Abstract: Cross-lingual transfer in multilingual NLP has been widely explored in supervised fine-tuning contexts, where factors like data availability and linguistic similarity largely determine transfer quality. As the field shifts toward…

13
arXiv — NLP / Computation & Language research 15d ago

Beyond Native Success: Auditing Deployment-Interface Exposure of CLIP Backdoors

arXiv:2606.17815v1 Announce Type: cross Abstract: Contrastive Language-Image Pre-training models are widely reused across downstream interfaces, including feature extraction, retrieval, reranking, and selection. Existing CLIP backdoor, however, usually validate attacks on a…

11
r/LocalLLaMA community 15d ago

Be wary of Qwen/Claude distillations - they're often worse than the base model

Just to be clear; I am not attempting to call anybody out or be mean to those who take the time/money to make these models, I just want to inform people about these distills/finetunes since there's clearly some confusion going on. I'm going to assume those of us who often visit…

37
Hugging Face Daily Papers research 15d ago

Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

Abstract Prompt-Level Distillation extracts reasoning patterns from teacher models to enhance student model performance while maintaining interpretability and reducing latency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Advanced reasoning typically requires Chain-of-Thought…

18
Hugging Face Daily Papers research 16d ago

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

Abstract Hierarchical Advantage-Weighted Behavior Cloning (HABC) addresses sparse reward challenges in robot learning by separately optimizing viability and efficiency objectives through adaptive critic heads and intervention-aware credit assignment, significantly improving…

9
arXiv — Machine Learning research 16d ago

Zero-order Parameter-free Optimization for LMO-based Methods: Novel Approach for Efficient Fine-tuning

arXiv:2606.14970v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) has become a central application of modern optimization, enabling pretrained models to adapt to diverse downstream tasks and domain-specific data. A major obstacle in large-scale fine-tuning…

35
arXiv — Machine Learning research 16d ago

FastMix: Fast Data Mixture Optimization via Gradient Descent

arXiv:2606.14971v1 Announce Type: new Abstract: While large and diverse datasets have driven recent advances in large models, identifying the optimal data mixture for pre-training and post-training remains a significant open problem. We address this challenge with FASTMIX, a…

23
arXiv — Machine Learning research 16d ago

Greedy Coordinate Diffusion: Effective and Semantically Coherent Adversarial Attacks via Diffusion Guidance

arXiv:2606.15531v1 Announce Type: new Abstract: Fine-tuning aligned language models on benign tasks (e.g. math tutoring) systematically breaks safety guardrails, even when training data contains no harmful content. While mechanistic approaches have shed light on where alignment…

36
arXiv — Machine Learning research 16d ago

Conflict-Aware Federated Fine-Tuning of Large Language Models with Mixture-of-Experts

arXiv:2606.15625v1 Announce Type: new Abstract: The continuous scaling of large language models (LLMs) incurs prohibitive computational costs, making Mixture-of-Experts (MoE) a scalable alternative for efficient fine-tuning via sparse activation. While federated learning (FL)…

11
arXiv — NLP / Computation & Language research 16d ago

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

arXiv:2606.15007v1 Announce Type: new Abstract: We introduce Nemotron 3 Ultra, a 550 billion total and 55 billion active parameter Mixture-of-Experts Hybrid Mamba-Attention language model. We pre-trained Nemotron 3 Ultra on 20 trillion text tokens, then extended the context…

19
Hugging Face Daily Papers research 16d ago

Retrieve, Don't Retrain: Extending Vision Language Action Models to New Tasks at Test Time

Abstract Retrieval-augmented vision-language-action policies eliminate per-task fine-tuning costs by using pre-trained models with indexed demonstrations, enabling efficient cross-embodiment generalization and task adaptation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

26
r/LocalLLaMA community 16d ago

Improving Neural Network Training by Decoupling the Magnitude and Direction of Weight Vectors | Alexander Hägele

This looks very promising in terms of simplifying and accelerating fine-tuning.   submitted by   /u/Thrumpwart [link]   [comments]

37
r/LocalLLaMA community 16d ago

We trained a cybersecurity-focused Mythos like LLM open weights on HuggingFace

We built OpenMythos for the Build Small Hackathon an open-source LLM trained specifically for cybersecurity tasks. Wanted to share our training approach since the RLVR setup was non-trivial and might be interesting to people doing similar domain-specific fine-tuning. The problem…

7
NVIDIA Developer Blog official-blog 16d ago

Fine-Tuning Biological Foundation Models with LoRA Using NVIDIA BioNeMo Recipes

Foundation models are reshaping computational biology. Pretrained on massive corpora of protein or genomic sequences, models such as ESM2 (a protein language...

8
arXiv — Machine Learning research 17d ago

Beyond LoRA: Is Sparsity-Induced Adaptation Better?

arXiv:2606.13767v1 Announce Type: new Abstract: Low-rank adaptation (LoRA) and its variants provide a memory- and compute-efficient alternative to full fine-tuning of pre-trained models. However, questions remain about the comparative generalizability of these approaches and how…

28
arXiv — NLP / Computation & Language research 17d ago

Be My Tutor: On-Policy Co-Distillation for Mutual LLM Improvement via Peer Feedback

arXiv:2606.14368v1 Announce Type: cross Abstract: We study multi-domain LLM training in which two models, each stronger in a different domain, co-evolve by tutoring each other through on-policy feedback. Unlike one-way distillation or single-model fine-tuning, our goal is mutual…

20
arXiv — NLP / Computation & Language research 17d ago

CoRe: A Continuously Reward-Finetuned LLM Query Rewriter for Multi-Stage Context-Aware Relevance in Web-Scale Video Search

arXiv:2606.14127v1 Announce Type: cross Abstract: LLM-based query rewriters in production face a tension: the training reward must reflect how the rewrite is consumed by the production ranker, yet the training procedure must be cheap enough to support continuous redeployment as…

17
r/LocalLLaMA community 18d ago

Dual r9700 ai pro for training llms?

I am a developer and need high vram machine to finetune llms, how has your experience been with finetuning/training on multi gpu on 2x r700 amd ai pro gpus?   submitted by   /u/AppropriatePush6262 [link]   [comments]

13
r/LocalLLaMA community 19d ago

New model on huggingface

https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B A qwen finetune. Looks pretty even with qwen 3.7 plus, except it's actually open source. Disclosure: I work as a researcher for the city government of Rio de Janeiro, which developed this model.   submitted by  …

15
Hugging Face Daily Papers research 19d ago

A Stationary (and Therefore Compatible) Representation is All You Need

Abstract Stationary representations learned through d-Simplex fixed classifiers ensure model compatibility during sequential fine-tuning and updates, enabling continuous retrieval services without reprocessing. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Learning compatible…

25
Hugging Face Daily Papers research 20d ago

Revisiting Articulated Parts Perception in Robot Manipulation

Abstract A new geometric representation called Geometric Primary Structure (GPS) is introduced for articulated parts perception, enabling efficient data collection through VR annotation and achieving high manipulation success rates without fine-tuning. Generated by…

27
arXiv — NLP / Computation & Language research 20d ago

MentalMARBERT: Domain-Adaptive Pre-training and Two-Stage Fine-Tuning for Arabic Mental Health Disorders Detection

arXiv:2606.12649v1 Announce Type: new Abstract: Detecting mental health disorders from Arabic social media text remains challenging due to dialectal variation, informal language, limited high-quality annotated resources, and severe class imbalance. While English mental health…

25
arXiv — NLP / Computation & Language research 20d ago

Small LLMs for Biomedical Claim Verification: Cost-Effective Fine-Tuning, Structural Dataset Shortcuts, and Cross-Domain Generalization

arXiv:2606.12854v1 Announce Type: new Abstract: Large Language Models such as GPT-4o and GPT-5 achieve strong zero-shot performance on biomedical claim verification, but cost and opacity limit scalable use. We fine-tune three small LLMs: Phi-3-mini (3.8B), Qwen2.5-3B, and…

33
arXiv — NLP / Computation & Language research 20d ago

Direct Preference Optimization for Chatbot Fine-Tuning: An Empirical Study

arXiv:2606.12881v1 Announce Type: new Abstract: We present an approach to fine-tuning large language models using Direct Preference Optimization (DPO), a reinforcement learning technique. Our experimental results demonstrate that DPO simplifies the training pipeline, improves…

24
arXiv — NLP / Computation & Language research 20d ago

PolyAlign: Conditional Human-Distribution Alignment

arXiv:2606.13227v1 Announce Type: new Abstract: Post-training methods such as supervised fine-tuning (SFT) and preference optimization typically align language models toward a single global assistant behavior. While effective for improving average helpfulness, this can suppress…

29
arXiv — NLP / Computation & Language research 20d ago

Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

arXiv:2606.13680v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is poorly suited for complex reasoning…

11
arXiv — NLP / Computation & Language research 20d ago

Understanding helpfulness and harmless tension in reward models

arXiv:2606.13209v1 Announce Type: cross Abstract: Reward models are a key component of reinforcement learning from human feedback (RLHF), aligning language models toward both helpful and harmless behaviour. However, the internal mechanisms underlying these objectives and their…

12
Hugging Face Daily Papers research 20d ago

Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

Abstract Evoflux enables compact language models to execute tool workflows more reliably by using evolutionary search to repair failed plans during inference, significantly improving execution feasibility compared to traditional fine-tuning methods. Generated by…

20
Hugging Face Daily Papers research 21d ago

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

Abstract ART enables parameter-efficient fine-tuning of frozen multimodal language models by optimizing raw visual input through gradient backpropagation, achieving performance comparable to LoRA while supporting pre-compiled computational graphs. Generated by…

8
r/LocalLLaMA community 21d ago

Refiner: Robotics library from the ex-Hugging Face pre-training team

ex-Huggingface pre-training team just announce a new library create for robotics data refinment! It supports ingestion of all robotics formats (Parquet, HDF5, MCAP, Zarr, RLDS, and LeRobot), as well as the common processing flows like visual hand-tracking, subtask annotations…

26
r/LocalLLaMA community 21d ago

AMD R9700 vs GB10

I have a budget of 5K, and want to buy some gpus my requirement is 48gb+ vram, because I finetune small language model, perform DPO, in general tinkering/ development is my usecase. if you where in my shoe which among these would you get, on one hand amd is better bang for buck,…

4
arXiv — Machine Learning research 21d ago

Probabilistic Contrastive Pretraining for Multi-task ADME Property Prediction

arXiv:2606.11508v1 Announce Type: new Abstract: Accurate prediction of absorption, distribution, metabolism, and excretion (ADME) properties is critical to drug discovery, but remains challenging because ADME endpoints are noisy, interdependent, and often data-limited. We…

13
arXiv — Machine Learning research 21d ago

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

arXiv:2606.11854v1 Announce Type: new Abstract: There are two main Parameter-Efficient Fine-Tuning (PEFT) techniques for Large Language Models (LLMs). While Low-Rank Adaptation (LoRA) introduces additional weights between the LLM layers, Soft Prompting introduces additional…

18
arXiv — Machine Learning research 21d ago

Harness In-Context Operator Learning with Chain of Operators

arXiv:2606.12318v1 Announce Type: new Abstract: Neural operators approximate mappings between function spaces, but often generalize poorly to other operators and usually require fine-tuning or retraining. In-Context Operator Networks (ICON) addresses this issue by prompting the…

28
arXiv — NLP / Computation & Language research 21d ago

Compatibility-Aware Dynamic Fine-Tuning for Large Language Models

arXiv:2606.11206v1 Announce Type: new Abstract: Supervised Fine-Tuning (SFT) is the predominant paradigm for aligning large language models (LLMs), yet it suffers from optimization instability and limited generalization. Recent work attributes this issue to pathological gradient…

20
arXiv — NLP / Computation & Language research 21d ago

When Probing Accuracy Saturates, Fragility Resolves: A Complementary Metric for LLM Pre-Training Analysis

arXiv:2606.11375v1 Announce Type: new Abstract: Standard linear probing declares a property "encoded" when a classifier on hidden states achieves high accuracy. The protocol works well on a snapshot but breaks across pre-training: probe accuracy saturates within the first few…

17
arXiv — NLP / Computation & Language research 21d ago

Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay

arXiv:2606.11786v1 Announce Type: new Abstract: Large Language Models (LLMs) offer new potential for translation tasks but often experience performance degradation when handling low-resource languages. To address this limitation, we propose an approach for fine-tuning LLMs on a…

37

Sparsity Curse: Understanding RLVR Model Parameter Space from Model Merging

Robust and Interpretable Adaptation of Equivariant Materials Foundation Models via Sparsity-promoting Fine-tuning

FoMoE: Breaking the Full-Replica Barrier with a Federation of MoEs

Beyond LoRA: Can you beat the most popular fine-tuning technique?

b9688

A Risk Decomposition Framework for Pre-Hoc Fine-Tuning Prediction

TuneAhead: Predicting Fine-tuning Performance Before Full Training Begins

Handling Feature Heterogeneity with Learnable Graph Patches

From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning

RepSelect: Robust LLM Unlearning via Representation Selectivity

Improving low-resource ASR using bilingual fine-tuning with language identification: a cross-linguistic evaluation

Learning task-specific subspaces via interventional post-training of speech foundation models

Fine-tuning LLMs for Passive Depression Severity Estimation from AI Mental Health Dialogue

When English Isn't the Best Teacher: Source Language Effects in Cross-Lingual In-Context Learning

Beyond Native Success: Auditing Deployment-Interface Exposure of CLIP Backdoors

Be wary of Qwen/Claude distillations - they're often worse than the base model

Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

Zero-order Parameter-free Optimization for LMO-based Methods: Novel Approach for Efficient Fine-tuning

FastMix: Fast Data Mixture Optimization via Gradient Descent

Greedy Coordinate Diffusion: Effective and Semantically Coherent Adversarial Attacks via Diffusion Guidance

Conflict-Aware Federated Fine-Tuning of Large Language Models with Mixture-of-Experts

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Retrieve, Don't Retrain: Extending Vision Language Action Models to New Tasks at Test Time

Improving Neural Network Training by Decoupling the Magnitude and Direction of Weight Vectors | Alexander Hägele

We trained a cybersecurity-focused Mythos like LLM open weights on HuggingFace

Fine-Tuning Biological Foundation Models with LoRA Using NVIDIA BioNeMo Recipes

Beyond LoRA: Is Sparsity-Induced Adaptation Better?

Be My Tutor: On-Policy Co-Distillation for Mutual LLM Improvement via Peer Feedback

CoRe: A Continuously Reward-Finetuned LLM Query Rewriter for Multi-Stage Context-Aware Relevance in Web-Scale Video Search

Dual r9700 ai pro for training llms?

New model on huggingface

A Stationary (and Therefore Compatible) Representation is All You Need

Revisiting Articulated Parts Perception in Robot Manipulation

MentalMARBERT: Domain-Adaptive Pre-training and Two-Stage Fine-Tuning for Arabic Mental Health Disorders Detection

Small LLMs for Biomedical Claim Verification: Cost-Effective Fine-Tuning, Structural Dataset Shortcuts, and Cross-Domain Generalization

Direct Preference Optimization for Chatbot Fine-Tuning: An Empirical Study

PolyAlign: Conditional Human-Distribution Alignment

Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

Understanding helpfulness and harmless tension in reward models

Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

Refiner: Robotics library from the ex-Hugging Face pre-training team

AMD R9700 vs GB10

Probabilistic Contrastive Pretraining for Multi-task ADME Property Prediction

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

Harness In-Context Operator Learning with Chain of Operators

Compatibility-Aware Dynamic Fine-Tuning for Large Language Models

When Probing Accuracy Saturates, Fragility Resolves: A Complementary Metric for LLM Pre-Training Analysis

Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay