News / #training Tag Training 450 articles archived under #training · RSS Sign in to follow arXiv — Machine Learning research 14d ago Sparsity Curse: Understanding RLVR Model Parameter Space from Model Merging arXiv:2606.18521v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Reward (RLVR) has emerged as a powerful post-training paradigm that surpasses Supervised Fine-Tuning (SFT) in eliciting reasoning intelligence and resisting catastrophic forgetting. Recent… 13 arXiv — Machine Learning research 14d ago Robust and Interpretable Adaptation of Equivariant Materials Foundation Models via Sparsity-promoting Fine-tuning arXiv:2606.18691v1 Announce Type: new Abstract: Pre-trained materials foundation models, or machine learning interatomic potentials, leverage general physicochemical knowledge to effectively approximate potential energy surfaces. However, they often require domain-specific… 10 arXiv — Machine Learning research 14d ago FoMoE: Breaking the Full-Replica Barrier with a Federation of MoEs arXiv:2606.19025v1 Announce Type: new Abstract: Pre-training Large Language Models (LLMs) typically demands large-scale infrastructure with tightly coupled hardware accelerators. While increasing model and dataset scale remains the dominant driver of performance,… 9 Hugging Face official-blog 14d ago Beyond LoRA: Can you beat the most popular fine-tuning technique? Back to Articles a]:hidden"> Beyond LoRA: Can you beat the most popular fine-tuning technique? Published June 18, 2026 Update on GitHub Upvote 6 Benjamin Bossan BenjaminB Sayak Paul sayakpaul Marian hubnemo Kashif Rasul kashif When you plan to fine-tune a model in a… 16 llama.cpp releases dev-tools 14d ago b9688 server: (router) add model management API ( #23976 ) wip server: (router) add SSE realtime updates API nits wip add download API add download api update docs add delete endpoint fix std::terminate fix crash fix 2 add tests nits macOS/iOS: macOS Apple Silicon (arm64) macOS Apple… 17 arXiv — Machine Learning research 15d ago A Risk Decomposition Framework for Pre-Hoc Fine-Tuning Prediction arXiv:2606.17649v1 Announce Type: new Abstract: The high cost of fine-tuning LLMs poses a significant economic barrier; pre-hoc performance prediction offers a critical solution to substantially reduce this expense. However, the theoretical limits of pre-hoc performance… 11 arXiv — Machine Learning research 15d ago TuneAhead: Predicting Fine-tuning Performance Before Full Training Begins arXiv:2606.17660v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) is compute-intensive and error-prone: model performance depends sensitively on data quality and hyperparameter choices, and na\"ive runs can even degrade model performance. This raises a… 21 arXiv — Machine Learning research 15d ago Handling Feature Heterogeneity with Learnable Graph Patches arXiv:2606.17667v1 Announce Type: new Abstract: In recent years, the rapid development of foundation models and graph pre-training technologies has spurred increasing interest in constructing a universal pre-trained graph model or Graph Foundation Model (GFM). However, a… 34 arXiv — Machine Learning research 15d ago From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning arXiv:2606.18089v1 Announce Type: new Abstract: Post-training pipelines that combine supervised fine-tuning (SFT) with reinforcement learning (RL) have emerged as the key recipe for transforming large language models (LLMs) into robust reasoners. We argue that this combined… 17 arXiv — NLP / Computation & Language research 15d ago RepSelect: Robust LLM Unlearning via Representation Selectivity arXiv:2606.17168v1 Announce Type: new Abstract: Making large language models (LLMs) deeply forget specific knowledge and values without sacrificing general capabilities remains a central challenge in unlearning. However, current methods are easily reversed by fine-tuning or… 29 arXiv — NLP / Computation & Language research 15d ago Improving low-resource ASR using bilingual fine-tuning with language identification: a cross-linguistic evaluation arXiv:2606.17820v1 Announce Type: new Abstract: This study explores how bilingual fine-tuning affects automatic speech recognition (ASR) in low-resource languages. We evaluate this method across nine linguistically and geographically diverse language pairs, covering a range of… 28 arXiv — NLP / Computation & Language research 15d ago Learning task-specific subspaces via interventional post-training of speech foundation models arXiv:2606.17967v1 Announce Type: new Abstract: Speech foundation models, pre-trained on large corpora of unlabelled speech data, produce general-purpose representations which are useful across tasks. However, these representations encode information about salient speech… 5 arXiv — NLP / Computation & Language research 15d ago Fine-tuning LLMs for Passive Depression Severity Estimation from AI Mental Health Dialogue arXiv:2606.17973v1 Announce Type: new Abstract: Depression is the leading cause of disability worldwide, and early detection of symptom change is essential for timely intervention. Validated instruments such as the Patient Health Questionnaire-9 (PHQ-9) support symptom… 17 arXiv — NLP / Computation & Language research 15d ago When English Isn't the Best Teacher: Source Language Effects in Cross-Lingual In-Context Learning arXiv:2606.18033v1 Announce Type: new Abstract: Cross-lingual transfer in multilingual NLP has been widely explored in supervised fine-tuning contexts, where factors like data availability and linguistic similarity largely determine transfer quality. As the field shifts toward… 13 arXiv — NLP / Computation & Language research 15d ago Beyond Native Success: Auditing Deployment-Interface Exposure of CLIP Backdoors arXiv:2606.17815v1 Announce Type: cross Abstract: Contrastive Language-Image Pre-training models are widely reused across downstream interfaces, including feature extraction, retrieval, reranking, and selection. Existing CLIP backdoor, however, usually validate attacks on a… 11 r/LocalLLaMA community 15d ago Be wary of Qwen/Claude distillations - they're often worse than the base model Just to be clear; I am not attempting to call anybody out or be mean to those who take the time/money to make these models, I just want to inform people about these distills/finetunes since there's clearly some confusion going on. I'm going to assume those of us who often visit… 37 Hugging Face Daily Papers research 15d ago Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning Abstract Prompt-Level Distillation extracts reasoning patterns from teacher models to enhance student model performance while maintaining interpretability and reducing latency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Advanced reasoning typically requires Chain-of-Thought… 18 Hugging Face Daily Papers research 16d ago Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes Abstract Hierarchical Advantage-Weighted Behavior Cloning (HABC) addresses sparse reward challenges in robot learning by separately optimizing viability and efficiency objectives through adaptive critic heads and intervention-aware credit assignment, significantly improving… 9 arXiv — Machine Learning research 16d ago Zero-order Parameter-free Optimization for LMO-based Methods: Novel Approach for Efficient Fine-tuning arXiv:2606.14970v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) has become a central application of modern optimization, enabling pretrained models to adapt to diverse downstream tasks and domain-specific data. A major obstacle in large-scale fine-tuning… 35 arXiv — Machine Learning research 16d ago FastMix: Fast Data Mixture Optimization via Gradient Descent arXiv:2606.14971v1 Announce Type: new Abstract: While large and diverse datasets have driven recent advances in large models, identifying the optimal data mixture for pre-training and post-training remains a significant open problem. We address this challenge with FASTMIX, a… 23 arXiv — Machine Learning research 16d ago Greedy Coordinate Diffusion: Effective and Semantically Coherent Adversarial Attacks via Diffusion Guidance arXiv:2606.15531v1 Announce Type: new Abstract: Fine-tuning aligned language models on benign tasks (e.g. math tutoring) systematically breaks safety guardrails, even when training data contains no harmful content. While mechanistic approaches have shed light on where alignment… 36 arXiv — Machine Learning research 16d ago Conflict-Aware Federated Fine-Tuning of Large Language Models with Mixture-of-Experts arXiv:2606.15625v1 Announce Type: new Abstract: The continuous scaling of large language models (LLMs) incurs prohibitive computational costs, making Mixture-of-Experts (MoE) a scalable alternative for efficient fine-tuning via sparse activation. While federated learning (FL)… 11 arXiv — NLP / Computation & Language research 16d ago Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning arXiv:2606.15007v1 Announce Type: new Abstract: We introduce Nemotron 3 Ultra, a 550 billion total and 55 billion active parameter Mixture-of-Experts Hybrid Mamba-Attention language model. We pre-trained Nemotron 3 Ultra on 20 trillion text tokens, then extended the context… 19 Hugging Face Daily Papers research 16d ago Retrieve, Don't Retrain: Extending Vision Language Action Models to New Tasks at Test Time Abstract Retrieval-augmented vision-language-action policies eliminate per-task fine-tuning costs by using pre-trained models with indexed demonstrations, enabling efficient cross-embodiment generalization and task adaptation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 26 r/LocalLLaMA community 16d ago Improving Neural Network Training by Decoupling the Magnitude and Direction of Weight Vectors | Alexander Hägele This looks very promising in terms of simplifying and accelerating fine-tuning.   submitted by   /u/Thrumpwart [link]   [comments] 37 r/LocalLLaMA community 16d ago We trained a cybersecurity-focused Mythos like LLM open weights on HuggingFace We built OpenMythos for the Build Small Hackathon an open-source LLM trained specifically for cybersecurity tasks. Wanted to share our training approach since the RLVR setup was non-trivial and might be interesting to people doing similar domain-specific fine-tuning. The problem… 7 NVIDIA Developer Blog official-blog 16d ago Fine-Tuning Biological Foundation Models with LoRA Using NVIDIA BioNeMo Recipes Foundation models are reshaping computational biology. Pretrained on massive corpora of protein or genomic sequences, models such as ESM2 (a protein language... 8 arXiv — Machine Learning research 17d ago Beyond LoRA: Is Sparsity-Induced Adaptation Better? arXiv:2606.13767v1 Announce Type: new Abstract: Low-rank adaptation (LoRA) and its variants provide a memory- and compute-efficient alternative to full fine-tuning of pre-trained models. However, questions remain about the comparative generalizability of these approaches and how… 28 arXiv — NLP / Computation & Language research 17d ago Be My Tutor: On-Policy Co-Distillation for Mutual LLM Improvement via Peer Feedback arXiv:2606.14368v1 Announce Type: cross Abstract: We study multi-domain LLM training in which two models, each stronger in a different domain, co-evolve by tutoring each other through on-policy feedback. Unlike one-way distillation or single-model fine-tuning, our goal is mutual… 20 arXiv — NLP / Computation & Language research 17d ago CoRe: A Continuously Reward-Finetuned LLM Query Rewriter for Multi-Stage Context-Aware Relevance in Web-Scale Video Search arXiv:2606.14127v1 Announce Type: cross Abstract: LLM-based query rewriters in production face a tension: the training reward must reflect how the rewrite is consumed by the production ranker, yet the training procedure must be cheap enough to support continuous redeployment as… 17 r/LocalLLaMA community 18d ago Dual r9700 ai pro for training llms? I am a developer and need high vram machine to finetune llms, how has your experience been with finetuning/training on multi gpu on 2x r700 amd ai pro gpus?   submitted by   /u/AppropriatePush6262 [link]   [comments] 13 r/LocalLLaMA community 19d ago New model on huggingface https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B A qwen finetune. Looks pretty even with qwen 3.7 plus, except it's actually open source. Disclosure: I work as a researcher for the city government of Rio de Janeiro, which developed this model.   submitted by  … 15 Hugging Face Daily Papers research 19d ago A Stationary (and Therefore Compatible) Representation is All You Need Abstract Stationary representations learned through d-Simplex fixed classifiers ensure model compatibility during sequential fine-tuning and updates, enabling continuous retrieval services without reprocessing. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Learning compatible… 25 Hugging Face Daily Papers research 20d ago Revisiting Articulated Parts Perception in Robot Manipulation Abstract A new geometric representation called Geometric Primary Structure (GPS) is introduced for articulated parts perception, enabling efficient data collection through VR annotation and achieving high manipulation success rates without fine-tuning. Generated by… 27 arXiv — NLP / Computation & Language research 20d ago MentalMARBERT: Domain-Adaptive Pre-training and Two-Stage Fine-Tuning for Arabic Mental Health Disorders Detection arXiv:2606.12649v1 Announce Type: new Abstract: Detecting mental health disorders from Arabic social media text remains challenging due to dialectal variation, informal language, limited high-quality annotated resources, and severe class imbalance. While English mental health… 25 arXiv — NLP / Computation & Language research 20d ago Small LLMs for Biomedical Claim Verification: Cost-Effective Fine-Tuning, Structural Dataset Shortcuts, and Cross-Domain Generalization arXiv:2606.12854v1 Announce Type: new Abstract: Large Language Models such as GPT-4o and GPT-5 achieve strong zero-shot performance on biomedical claim verification, but cost and opacity limit scalable use. We fine-tune three small LLMs: Phi-3-mini (3.8B), Qwen2.5-3B, and… 33 arXiv — NLP / Computation & Language research 20d ago Direct Preference Optimization for Chatbot Fine-Tuning: An Empirical Study arXiv:2606.12881v1 Announce Type: new Abstract: We present an approach to fine-tuning large language models using Direct Preference Optimization (DPO), a reinforcement learning technique. Our experimental results demonstrate that DPO simplifies the training pipeline, improves… 24 arXiv — NLP / Computation & Language research 20d ago PolyAlign: Conditional Human-Distribution Alignment arXiv:2606.13227v1 Announce Type: new Abstract: Post-training methods such as supervised fine-tuning (SFT) and preference optimization typically align language models toward a single global assistant behavior. While effective for improving average helpfulness, this can suppress… 29 arXiv — NLP / Computation & Language research 20d ago Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning arXiv:2606.13680v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is poorly suited for complex reasoning… 11 arXiv — NLP / Computation & Language research 20d ago Understanding helpfulness and harmless tension in reward models arXiv:2606.13209v1 Announce Type: cross Abstract: Reward models are a key component of reinforcement learning from human feedback (RLHF), aligning language models toward both helpful and harmless behaviour. However, the internal mechanisms underlying these objectives and their… 12 Hugging Face Daily Papers research 20d ago Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents Abstract Evoflux enables compact language models to execute tool workflows more reliably by using evolutionary search to repair failed plans during inference, significantly improving execution feasibility compared to traditional fine-tuning methods. Generated by… 20 Hugging Face Daily Papers research 21d ago Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training Abstract ART enables parameter-efficient fine-tuning of frozen multimodal language models by optimizing raw visual input through gradient backpropagation, achieving performance comparable to LoRA while supporting pre-compiled computational graphs. Generated by… 8 r/LocalLLaMA community 21d ago Refiner: Robotics library from the ex-Hugging Face pre-training team ex-Huggingface pre-training team just announce a new library create for robotics data refinment! It supports ingestion of all robotics formats (Parquet, HDF5, MCAP, Zarr, RLDS, and LeRobot), as well as the common processing flows like visual hand-tracking, subtask annotations… 26 r/LocalLLaMA community 21d ago AMD R9700 vs GB10 I have a budget of 5K, and want to buy some gpus my requirement is 48gb+ vram, because I finetune small language model, perform DPO, in general tinkering/ development is my usecase. if you where in my shoe which among these would you get, on one hand amd is better bang for buck,… 4 arXiv — Machine Learning research 21d ago Probabilistic Contrastive Pretraining for Multi-task ADME Property Prediction arXiv:2606.11508v1 Announce Type: new Abstract: Accurate prediction of absorption, distribution, metabolism, and excretion (ADME) properties is critical to drug discovery, but remains challenging because ADME endpoints are noisy, interdependent, and often data-limited. We… 13 arXiv — Machine Learning research 21d ago Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training arXiv:2606.11854v1 Announce Type: new Abstract: There are two main Parameter-Efficient Fine-Tuning (PEFT) techniques for Large Language Models (LLMs). While Low-Rank Adaptation (LoRA) introduces additional weights between the LLM layers, Soft Prompting introduces additional… 18 arXiv — Machine Learning research 21d ago Harness In-Context Operator Learning with Chain of Operators arXiv:2606.12318v1 Announce Type: new Abstract: Neural operators approximate mappings between function spaces, but often generalize poorly to other operators and usually require fine-tuning or retraining. In-Context Operator Networks (ICON) addresses this issue by prompting the… 28 arXiv — NLP / Computation & Language research 21d ago Compatibility-Aware Dynamic Fine-Tuning for Large Language Models arXiv:2606.11206v1 Announce Type: new Abstract: Supervised Fine-Tuning (SFT) is the predominant paradigm for aligning large language models (LLMs), yet it suffers from optimization instability and limited generalization. Recent work attributes this issue to pathological gradient… 20 arXiv — NLP / Computation & Language research 21d ago When Probing Accuracy Saturates, Fragility Resolves: A Complementary Metric for LLM Pre-Training Analysis arXiv:2606.11375v1 Announce Type: new Abstract: Standard linear probing declares a property "encoded" when a classifier on hidden states achieves high accuracy. The protocol works well on a snapshot but breaks across pre-training: probe accuracy saturates within the first few… 17 arXiv — NLP / Computation & Language research 21d ago Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay arXiv:2606.11786v1 Announce Type: new Abstract: Large Language Models (LLMs) offer new potential for translation tasks but often experience performance degradation when handling low-resource languages. To address this limitation, we propose an approach for fine-tuning LLMs on a… 37 Page 3 of 9 · 450 articles ← Newer Older →