News / #training Tag Training 450 articles archived under #training · RSS Sign in to follow arXiv — Machine Learning research 28d ago RL Excursions during Pre-Training: Re-examining Policy Optimization for LLM training arXiv:2606.04272v1 Announce Type: new Abstract: The standard LLM training pipeline applies reinforcement learning (RL) only after pre-training and supervised fine-tuning (SFT). We question this status quo by training a LLM from scratch and applying RL, SFT, and SFT followed by… 28 arXiv — NLP / Computation & Language research 28d ago Sparse Mixture-of-Experts Reward Models Learn Interpretable and Specialized Experts for Personalized Preference Modeling arXiv:2606.04284v1 Announce Type: cross Abstract: Preference modeling plays a central role in reinforcement learning from human feedback (RLHF), enabling large language models (LLMs) to align with human values. However, most existing approaches assume a universal reward… 28 arXiv — Machine Learning research 28d ago OpenRFM: Dissecting Relational In-Context Learning arXiv:2606.04320v1 Announce Type: new Abstract: Relational Foundation Models (RFMs) promise a single pre-trained predictor that, given any relational database, returns predictions in one forward pass via relational in-context learning (ICL). Yet a substantial gap separates open… 19 arXiv — Machine Learning research 28d ago (Mis)generalization of Helpful-only Fine-tuning arXiv:2606.04413v1 Announce Type: new Abstract: Helpful-only models, that is, models that are trained to always follow user intent, are valuable for dangerous capability evaluations and other areas of AI R&D where refusals would be an obstacle. Little is known about the… 34 arXiv — NLP / Computation & Language research 28d ago Long Live Fine-Tuning: Task-Specific Transformers Outperform Zero-Shot LLMs for Misinformation Response Classification on Reddit arXiv:2606.04274v1 Announce Type: new Abstract: As large language models (LLMs) become default tools for online information verification, an implicit assumption follows them: that scale and general capability are sufficient for nuanced classification of misinformation discourse.… 30 arXiv — NLP / Computation & Language research 28d ago Parameter-Efficient Fine-Tuning with Learnable Rank arXiv:2606.04325v1 Announce Type: new Abstract: Low-Rank Adaptation (LoRA) is a popular parameter-efficient fine-tuning (PEFT) method that restricts weight updates to low-rank adapters, introducing a fixed low-rank inductive bias by optimizing in a low-dimensional subspace. In… 16 arXiv — NLP / Computation & Language research 28d ago StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis arXiv:2606.04246v1 Announce Type: cross Abstract: Automatic generation of RTL code for digital hardware designs remains challenging due to long-horizon reasoning, multi-step dependencies, and strict correctness constraints in Verilog and VHDL. We present StepPRM-RTL, a novel… 8 r/LocalLLaMA community 28d ago The first Gemma 4 12B finetunes are ready Now you can start building your Gemma 4 12B collection :) https://huggingface.co/igorls/gemma-4-12B-it-heretic-GGUF https://huggingface.co/ReadyArt/Melody1437-12B-v0.4-GGUF https://huggingface.co/DuoNeural/Gemma4-12B-IT-Abliterated-GGUF… 26 r/LocalLLaMA community 28d ago gemma-4-12b-it vs Qwen3.5-9B on shared benchmarks: Qwen is overall winner beating gemma in 5/8 benchmarks despite a smaller footprint I don't really understand the gemma hype. Qwen outperforms gemma gb for gb, and kv cache is lighter. Sure gemma-4-12b-it might be a slight better coder than Qwen3.5-9b, but you could also just use omnicoder-9b (Qwen3.5-9b finetune for coding). Note: Benchmark results come from… 19 r/LocalLLaMA community 28d ago google/gemma-4-12B · Hugging Face Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on E2B, E4B, and 12B) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned… 29 Hugging Face Daily Papers research 29d ago Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking Abstract Humanoid-GPT is a GPT-style Transformer with causal attention trained on a billion-scale motion corpus that achieves zero-shot generalization to unseen motions and control tasks through scalable pre-training on diverse motion data. Generated by… 29 r/LocalLLaMA community 29d ago Holo3.1 35B/9B/4B/0.8B (Qwen 3.5 finetunes) from Hcompany (which seems to be a French company): Holo3.1: Fast & Local Computer Use Agents Model Description Holo3.1 is our latest family of Vision-Language Models (VLMs) for computer use agents. Building on Holo3, it expands support beyond browser and desktop automation to… 25 Hugging Face Daily Papers research 29d ago Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces Abstract Answer-correct long chain-of-thought traces can lead to different fine-tuning outcomes, with post-conclusion continuations identified as harmful to training, characterized by uncertainty-geometry mismatches and addressed through a lightweight boundary proxy method.… 26 arXiv — Machine Learning research 29d ago Pruning Deep Neural Networks via the Marchenko--Pastur Distribution arXiv:2606.02608v1 Announce Type: new Abstract: We study a Marchenko--Pastur (MP) random-matrix approach to pruning deep neural networks with very small post-pruning fine-tuning budgets. The main practical contribution is accuracy retention under short calibration and… 34 arXiv — Machine Learning research 29d ago GRZO: Group-Relative Zeroth-Order Optimization for Large Language Model Fine-Tuning arXiv:2606.02857v1 Announce Type: new Abstract: Zeroth-order (ZO) optimization is a memory-efficient alternative to backpropagation for fine-tuning large language models, but its deployment is limited by the high variance of gradient estimation. We propose GRZO, a Group-Relative… 22 arXiv — Machine Learning research 29d ago BYORn: Bootstrap Your Own Responses to Defend Large Vision-Language Models Against Backdoor Attacks arXiv:2606.02947v1 Announce Type: new Abstract: Supervised fine-tuning is the predominant approach for adapting autoregressive vision-language models to downstream tasks. Recent work has shown that this paradigm is highly vulnerable to backdoor attacks, and that existing… 17 arXiv — Machine Learning research 29d ago CoughSense: Five-Class Respiratory Disease Classification via Whisper Encoder Fine-Tuning and Dual-Encoder Cross-Attention Fusion with Balanced Contrastive Learning arXiv:2606.02998v1 Announce Type: new Abstract: Automated cough analysis offers a path to low-cost respiratory screening, but most existing work stops at binary COVID-19 detection. A practical tool needs to tell apart several respiratory conditions from one cough recording on a… 4 arXiv — Machine Learning research 29d ago DECA: Decentralizing Block-Wise Adam for Efficient LLM Full-Parameter Fine-Tuning on Non-IID Data arXiv:2606.03209v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) in privacy-sensitive and resource-constrained environments remains challenging. Since training data are often distributed across multiple clients, decentralized fine-tuning offers a natural… 15 arXiv — Machine Learning research 29d ago When RLHF Fails: A Mechanistic Taxonomy of Reward Hacking, Collapse, and Evaluator Gaming arXiv:2606.03238v1 Announce Type: new Abstract: Reinforcement learning from human feedback (RLHF) makes large-scale post-training possible by replacing an underspecified human objective with learned and scalable proxies. The same substitution creates a structured failure… 12 arXiv — Machine Learning research 29d ago Message Tuning Outshines Graph Prompt Tuning: A Prismatic Space Perspective arXiv:2606.03290v1 Announce Type: new Abstract: Graph Foundation Models (GFMs), built upon the Pre-training and Adaptation paradigm, have emerged as a research hotspot in graph learning. For GNN-based GFMs, graph prompt tuning has become the prevailing adaptation method for… 4 arXiv — NLP / Computation & Language research 29d ago Regret Pre-training: Bridging Prior and Posterior Views for Enhanced Knowledge Grounding arXiv:2606.03080v1 Announce Type: new Abstract: Causal language models factorize sequence probabilities using only preceding context, leaving future information unexploited during training despite its availability in the training data. This paper introduces Regret Pre-training,… 31 arXiv — NLP / Computation & Language research 29d ago The Word and the Way: Strategies for Domain-Specific BERT Pre-Training in German Medical NLP arXiv:2606.03250v1 Announce Type: new Abstract: Digital healthcare generates vast amounts of clinical text that can support AI-assisted applications, yet German biomedical language models remain limited by older architectures or restricted training data. We present ChristBERT… 33 arXiv — NLP / Computation & Language research 29d ago From Script to Semantics: Prompting Strategies for African NLI arXiv:2606.03304v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly evaluated in multilingual settings, yet their inference behavior in low-resource African languages remains underexplored especially under pure prompting without fine-tuning. We present… 38 arXiv — NLP / Computation & Language research 29d ago Large Language Models Are Overconfident in Their Own Responses arXiv:2606.03437v1 Announce Type: new Abstract: Prior work has shown that instruction-tuned large language models (LLMs) are less well calibrated than their base pre-trained counterparts. However, little is known about the frequently used chat template's effect on the… 10 arXiv — NLP / Computation & Language research 29d ago AutoTail-BSFGM: Class-Balance-Aware Fine-Tuning for Chinese Scholarly Text Classification arXiv:2606.03576v1 Announce Type: new Abstract: Scholarly text classification supports literature organization, subject indexing, and research intelligence, but Chinese scholarly corpora often contain imbalanced and semantically adjacent disciplinary labels. We propose… 5 arXiv — NLP / Computation & Language research 29d ago Safety Measurements for Fine-tuned LLMs Should be Grounded in Capability arXiv:2606.03648v1 Announce Type: new Abstract: Adapting foundation large language models to a user's task or preferred style through fine-tuning can result in compromising the model's safety. Previous works examined the effects of fine-tuning on model safety in limited and… 32 r/LocalLLaMA community 29d ago Remember around 2023-2024 when we did partys (wizardlm, nous capybara and dolphin) and finetunes? Yes, I remember it. It was peak. Now those models get outpeformed by 2026-era models. I want to revive this era I miss it so bad 😞   submitted by   /u/Ok-Type-7663 [link]   [comments] 29 llama.cpp releases dev-tools 1mo ago b9468 server: real-time reasoning interruption via control endpoint ( #23971 ) server: real-time reasoning interruption via control endpoint Builds on the manual reasoning budget trigger from #23949 . Adds a CONTROL task that mirrors the CANCEL path on the live slot and calls… 17 arXiv — Machine Learning research 1mo ago From Demonstrations to Rewards: Test-Time Prompt Optimization for VLM Reward Models arXiv:2606.00083v1 Announce Type: new Abstract: Reinforcement learning relies on accurate reward functions, which are often hand-crafted or even unavailable in real-world applications, such as robotics. Recent work has explored the zero-shot reasoning capabilities of pre-trained… 11 arXiv — Machine Learning research 1mo ago RAFT: Data Refinement and Adaptive Distillation for Domain Fine-Tuning with Alleviated Forgetting arXiv:2606.00147v1 Announce Type: new Abstract: Domain-specific supervised fine-tuning (SFT) often improves in-domain performance at the cost of degrading a model's general capabilities. We view this degradation through two practical gaps in domain SFT: a… 10 arXiv — Machine Learning research 1mo ago A Pre-Training Analogue of Grokking in Language Models: Tracing Delayed Grammatical Generalization arXiv:2606.00230v1 Announce Type: new Abstract: Grokking, the phenomenon in which neural networks generalize long after fitting their training data, has been studied in supervised settings on many epochs. LLM pre-training instead involves next-token prediction over an unlabeled… 25 arXiv — Machine Learning research 1mo ago ARCA: Adapter-Residual Credit Assignment When Token Signals Degenerate arXiv:2606.00257v1 Announce Type: new Abstract: Token-level credit assignment for language-model reinforcement learning is usually formulated as if the policy were fully trainable, while practical LLM-RL pipelines often rely on parameter-efficient fine-tuning, especially LoRA.… 9 arXiv — Machine Learning research 1mo ago CRMA: A Spectrally-Bounded Backbone for Modular Continual Fine-Tuning of LLMs arXiv:2606.00382v1 Announce Type: new Abstract: Sequential fine-tuning of large language models forces a choice: let the shared substrate keep learning and accept catastrophic forgetting, or freeze it after task one and foreclose cross-task refinement. Per-task adapter methods… 23 arXiv — Machine Learning research 1mo ago Escaping the Mode Lottery: Multi-Response Training Improves Language Model Generalization arXiv:2606.00544v1 Announce Type: new Abstract: Modern language-model fine-tuning typically pairs each prompt with a single response, even though many prompts admit multiple valid completions. This effectively reduces a multi-modal conditional distribution to a one-sample view,… 13 arXiv — NLP / Computation & Language research 1mo ago LinguIUTics at PsyDefDetect: Iterative Imbalance-Aware Fine-tuning of Qwen3-8B for Psychological Defense Mechanism Classification arXiv:2606.00647v1 Announce Type: new Abstract: Detecting psychological defense mechanisms in conversational text remains a challenging clinical NLP problem. For the PsyDefDetect 2026 shared task (nine-class utterance classification evaluated via macro F1), our team LinguIUTics… 5 Hugging Face Daily Papers research 1mo ago LongAttnComp: Cross-Family Context Compression for Long-Context Reasoning Abstract LongAttnComp adapts AttnComp for long-context processing by fine-tuning lightweight attention layers and implementing token-level chunking and positional reordering techniques. AI-generated summary As real-world applications increasingly require processing inputs of… 27 Hugging Face Daily Papers research 1mo ago On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters Abstract Parameter-efficient fine-tuning can function as a compact substrate for persistent personal models by enabling small trainable adapters to store instance-specific behaviors on top of strong foundation models. AI-generated summary Parameter-efficient fine-tuning (PEFT)… 21 Hugging Face Daily Papers research 1mo ago Draft-OPD: On-Policy Distillation for Speculative Draft Models Abstract Speculative decoding uses a lightweight draft model to accelerate large language model inference, but supervised fine-tuning plateaus due to offline-to-inference mismatch, which is addressed through on-policy distillation with target-assisted rollouts and error replay.… 29 Hugging Face Daily Papers research 1mo ago Compositional Text-to-Image Generation Via Region-aware Bimodal Direct Preference Optimization Abstract BiDPO enhances text-to-image models for complex compositional prompts through preference-based fine-tuning and region-level guidance. AI-generated summary Despite the rapid progress of text-to-image (T2I) models, generating images that accurately reflect complex… 18 Hugging Face Daily Papers research 1mo ago NITP: Next Implicit Token Prediction for LLM Pre-training Abstract Next Implicit Token Prediction enhances language model training by adding dense continuous supervision in representation space, improving generalization and performance across model sizes with minimal computational overhead. AI-generated summary Standard next-token… 34 r/MachineLearning community 1mo ago [P] Free AI Agent Security Assessment [P] Hey everyone, We’re building Antitech , a security layer for AI agents and LLM-powered workflows. We’re opening a small number of free early-access assessments for teams/builders working on AI agents. If you give us access to an endpoint of a Dockerized / sandboxed environment… 8 Hugging Face Daily Papers research 1mo ago DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization Abstract DRIFT is a framework that combines offline trajectories with importance-weighted supervised fine-tuning to achieve multi-turn interactive learning efficiency and performance comparable to reinforcement learning. AI-generated summary Large language models are… 38 Hugging Face Daily Papers research 1mo ago The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement Abstract SAVE framework improves reward model training by using value functions to grade on-policy responses and update models through contrastive objectives. AI-generated summary Building strong reward models (RMs) for language model alignment is bottlenecked by the cost and… 26 arXiv — Machine Learning research 1mo ago The Long-Term Effects of Data Selection in LLM Fine-Tuning arXiv:2605.30537v1 Announce Type: new Abstract: Data selection is increasingly used to reduce the cost of large language model (LLM) fine-tuning, with recent methods prioritizing samples by current utility, diversity, quality, or influence. This paper studies a different… 16 arXiv — Machine Learning research 1mo ago CSULoRA: Closest Safe Update Low-Rank Adaptation arXiv:2605.30640v1 Announce Type: new Abstract: Low-rank adaptation has become a standard method for parameter-efficient fine-tuning of large language models, but even small amounts of unsafe or adversarial fine-tuning data can substantially weaken the safety behavior of aligned… 28 arXiv — Machine Learning research 1mo ago SemStruct: Contextualizing Semantic Embeddings with Structural Information for Schema Matching arXiv:2605.30729v1 Announce Type: new Abstract: Schema matching is a fundamental step in integrating heterogeneous data sources. While Pre-trained Language Models (PLMs) have revolutionized this task by capturing linguistic semantics, they typically process tabular data as… 35 arXiv — Machine Learning research 1mo ago Efficient and Uncertainty-Aware Diffusion Framework for Offline-to-Online Reinforcement Learning arXiv:2605.30776v1 Announce Type: new Abstract: Offline-to-Online Reinforcement Learning (O2O-RL) leverages an offline, pre-trained policy to minimize costly online interactions. Although data-efficient, O2O-RL is susceptible to shifts between offline and online distributions.… 8 arXiv — NLP / Computation & Language research 1mo ago Fine-Tuning Improves Information Conveyance in Language Models arXiv:2605.30844v1 Announce Type: new Abstract: Fine-tuning is often believed to reduce uncertainty and diversity in large language models, but existing analyses overlook output length, a key confounder, and therefore fail to capture how uncertainty is distributed across an… 27 arXiv — NLP / Computation & Language research 1mo ago MADS: Model-Aware Diverse Core Set Selection for Instruction Tuning arXiv:2605.30857v1 Announce Type: new Abstract: Instruction fine-tuning is employed to enhance the instruction-following ability of large language models (LLMs). As the amount of instruction fine-tuning data increases, selecting the optimal core set becomes particularly… 16 arXiv — NLP / Computation & Language research 1mo ago The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement arXiv:2605.30888v1 Announce Type: new Abstract: Building strong reward models (RMs) for language model alignment is bottlenecked by the cost and difficulty of acquiring diverse and reliable preference data from human annotation or judge models. It is dramatically worse as the… 31 Page 5 of 9 · 450 articles ← Newer Older →