News / #training Tag Training 450 articles archived under #training · RSS Sign in to follow arXiv — Machine Learning research 1mo ago Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing arXiv:2605.24052v1 Announce Type: new Abstract: To better serve users' demands in mobile applications (e.g., navigation), mobile crowdsourcing platforms can iteratively align large language model (LLM)-generated content (e.g., AI-generated traffic condition predictions) with… 10 arXiv — Machine Learning research 1mo ago Signs Beat Floats: Low-Rank Double-Binary Adaptation for On-Device Fine-Tuning arXiv:2605.24058v1 Announce Type: new Abstract: On-device adaptation of large language models commonly keeps a quantized base model frozen while training and deploying a small, task-specific LoRA adapter. In the unmerged adapter-mode setting, however, the adapter is more than a… 28 arXiv — Machine Learning research 1mo ago Bilevel Optimization of Synthetic Trajectories for Multi-Turn LLM Fine-Tuning arXiv:2605.24743v1 Announce Type: new Abstract: While LLMs excel at single-turn generation, they struggle with long-horizon, multi-turn interactions. Offline reinforcement learning (RL) offers a scalable approach, yet its performance hinges on the availability and quality of… 34 arXiv — NLP / Computation & Language research 1mo ago Temporal Concept Drift in Legal Judgment Prediction: Neural Baselines Across Three Epochs of Ukrainian Court Decisions arXiv:2605.24452v1 Announce Type: new Abstract: Legal NLP benchmarks evaluate models on randomly split data, implicitly assuming that legal language is stationary. We test this assumption by fine-tuning four transformer encoders -- XLM-RoBERTa (base and large) and their… 15 arXiv — NLP / Computation & Language research 1mo ago Mix-MoE: Improving Multilingual Machine Translation of Large Language Models through Mixed MoEs arXiv:2605.24681v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown great promise in multilingual machine translation (MT), even with limited bilingual supervision. However, fine-tuning LLMs with parallel corpora presents major challenges, namely parameter… 28 arXiv — NLP / Computation & Language research 1mo ago NITP: Next Implicit Token Prediction for LLM Pre-training arXiv:2605.24956v1 Announce Type: new Abstract: Standard next-token prediction (NTP) supervises language models solely through discrete labels in the output logit space. We argue that this sparse one-hot supervision leaves the latent representation space under-constrained,… 23 arXiv — Machine Learning research 1mo ago FuRA: Full-Rank Parameter-Efficient Fine-Tuning with Spectral Preconditioning arXiv:2605.22869v1 Announce Type: new Abstract: Both full fine-tuning (Full FT) and parameter-efficient fine-tuning methods such as LoRA introduce weight updates without accounting for the spectral structure established during pretraining. As a result, noisy gradients from… 36 arXiv — Machine Learning research 1mo ago Understanding and Improving Noisy Embedding Techniques in Instruction Finetuning arXiv:2605.23171v1 Announce Type: new Abstract: Recent advancements in instructional fine-tuning have injected noise into embeddings, with NEFTune (Jain et al., 2024) setting benchmarks using uniform noise. Despite NEFTune's empirical findings that uniform noise outperforms… 37 arXiv — Machine Learning research 1mo ago RelPrism: A Multi-Faceted Pre-training Framework with Self-Generated Tasks for Relational Databases arXiv:2605.23241v1 Announce Type: new Abstract: Relational databases (RDBs) remain the cornerstone of modern data systems and support diverse predictive tasks. Recent relational deep learning (RDL) methods enable end-to-end prediction by converting RDBs into graphs, where rows… 16 arXiv — Machine Learning research 1mo ago Convex Optimization for Alignment and Preference Learning on a Single GPU arXiv:2605.23244v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) to align with human preferences has driven the success of systems such as Gemini and ChatGPT. However, approaches like Reinforcement Learning from Human Feedback (RLHF) remain… 20 arXiv — Machine Learning research 1mo ago Diffusion Domain Expansion: Learning to Coordinate Pre-trained Diffusion Models arXiv:2605.23275v1 Announce Type: new Abstract: In this paper, we propose Diffusion Domain Expansion (DDE), a method that efficiently extends pre-trained diffusion models to generate larger objects and handle more complex conditioning beyond their original capabilities. Our… 27 arXiv — NLP / Computation & Language research 1mo ago Learnability-Informed Fine-Tuning of Diffusion Language Models arXiv:2605.22939v1 Announce Type: new Abstract: We aim to improve the reasoning capabilities of diffusion language models (DLMs). While SFT is a popular post-training recipe for autoregressive models, its use in DLMs faces challenges and can even hurt performance, though the… 20 arXiv — NLP / Computation & Language research 1mo ago Structure-Guided Entity Resolution: Fine-Tuning LLMs for Robust Name Matching in Complex Linguistic Contexts arXiv:2605.23597v1 Announce Type: new Abstract: Matching person names across heterogeneous records is a core challenge in entity resolution, especially within linguistically and culturally complex environments. Variations in naming conventions, inconsistent transliteration… 24 arXiv — NLP / Computation & Language research 1mo ago Is a Document Educational or Just Wikipedia-Style? -- Pitfalls of Classifier-Based Quality Filtering arXiv:2605.23721v1 Announce Type: new Abstract: Classifier-based Quality Filtering has recently emerged as a fundamental technique in constructing pre-training corpora. The ability to deploy a single model that can replace or supplement a set of heuristics has proven effective… 35 arXiv — NLP / Computation & Language research 1mo ago Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum arXiv:2510.00526v3 Announce Type: replace Abstract: Supervised fine-tuning (SFT) is the standard approach for post-training large language models (LLMs), yet it often shows limited generalization. We trace this limitation to its default training objective: negative log… 13 arXiv — NLP / Computation & Language research 1mo ago Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches arXiv:2512.12677v2 Announce Type: replace Abstract: We explore efficient strategies to fine-tune decoder-only Large Language Models (LLMs) for downstream text classification under resource constraints. Two approaches are investigated: (1) attaching a classification head to a… 4 r/LocalLLaMA community 1mo ago llama.cpp has a clever trick for speeding up KV cache decode So, I use llama-server as my endpoint to run local models and connect them to Open-WebUI, Hermes, and OpenCode. But since llama.cpp's webUI has been receiving a lot of updates, I took a look at its settings and noticed a particular one under developer options. This is the… 23 r/LocalLLaMA community 1mo ago G4-MeroMero-26B-A4B-it-uncensored-heretic Is Out Now, a Finetune of gemma-4-26B-A4B-it, With KLD of 0.0152 and 12/100 Refusals! When I previously posted the uncensored version of the 31B version of the MeroMero finetune, quite a few people asked for the 26B-A4B version, I wasn't so keen on it because I considered the 31B to be the better version, but I understand that people might want the 26B-A4B… 20 Hugging Face Daily Papers research 1mo ago Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators Abstract Audio diffusion models are adapted for interactive music generation through efficient block-wise processing and novel training paradigms that enable real-time performance on consumer hardware. AI-generated summary Interactive streaming music generation promises the use… 11 r/LocalLLaMA community 1mo ago Low-level coding dataset Hi all, I've recently been thinking about putting together a community sourced coding dataset for finetuning models, with a heavy focus on cpp and systems programming. My goal is to eventually have a model (say a finetune of Qwen3.6-27b) that is good at stuff like memory… 15 arXiv — Machine Learning research 1mo ago From Parameters to Data: A Task-Parameter-Guided Fine-Tuning Pipeline for Efficient LLM Alignment arXiv:2605.21558v1 Announce Type: new Abstract: Adapting Large Language Models (LLMs) to specialized domains typically incurs high data and computational overhead. While prior efficiency efforts have largely treated data selection and parameter-efficient fine-tuning as isolated… 38 arXiv — NLP / Computation & Language research 1mo ago Token-weighted Direct Preference Optimization with Attention arXiv:2605.21883v1 Announce Type: new Abstract: Direct Preference Optimization (DPO) aligns Large Language Models with human preferences without the need for a separate reward model. However, DPO treats all tokens in responses equally, neglecting the differing importance of… 5 arXiv — NLP / Computation & Language research 1mo ago Modeling Pathology-Like Behavioral Patterns in Language Models Through Behavioral Fine-Tuning arXiv:2605.22356v1 Announce Type: new Abstract: Large language models are increasingly used as computational tools for modeling human-like behavior. We introduce a behavioral induction framework that modifies model policies through fine-tuning on structured decision-making… 19 arXiv — NLP / Computation & Language research 1mo ago Beyond Temperature: Hyperfitting as a Late-Stage Geometric Expansion arXiv:2605.22579v1 Announce Type: new Abstract: Recent work has identified a counterintuitive phenomenon termed "Hyperfitting", where fine-tuning Large Language Models (LLMs) to near-zero training loss on small datasets surprisingly enhances open-ended generation quality and… 16 arXiv — NLP / Computation & Language research 1mo ago Understanding Data Temporality Impact on Large Language Models Pre-training arXiv:2605.22769v1 Announce Type: new Abstract: Large language models (LLMs) are typically trained on shuffled corpora, yielding models whose knowledge is frozen at train time and whose temporal grounding remains poorly understood. In this work, we study the impact of… 14 llama.cpp releases dev-tools 1mo ago b9276 server: expose prompt token counts in /slots endpoint ( #23454 ) Add n_prompt_tokens, n_prompt_tokens_processed, and n_prompt_tokens_cache to the /slots JSON response. These fields are already tracked internally but were not exposed, making it impossible for clients to monitor… 15 r/LocalLLaMA community 1mo ago LatitudeGames/Equinox-31B · Hugging Face new model from LatitudeGames - Gemma 31B finetune https://huggingface.co/LatitudeGames/Equinox-31B-GGUF Equinox draws its name from the balance between extremes. Trained on a balanced blend of Wayfarer 2 's unforgiving dark adventures and Hearthfire 's quiet slice-of-life… 14 r/LocalLLaMA community 1mo ago I'm running an agentic system with kobold.cpp as my backend. Am I losing performance? Currently, I'm running a Hermes agent with an OpenAI v1 compatible endpoint provided by Kobold. My setup is a a 24GB 3090Ti + 512GB DDR4 running Qwen3.6-35B-A3B. I plan to move to a larger MoE model once I'm satisfied with how everything is working, but I'm just wondering if I'm… 33 Hugging Face Daily Papers research 1mo ago Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining Abstract A large-scale GUI dataset was created by automatically extracting interaction trajectories from internet videos, enabling improved performance in GUI agents through pre-training on this diverse collection. AI-generated summary Recent advances in multimodal large… 35 arXiv — Machine Learning research 1mo ago Spectral Unforgetting: Post-Hoc Recovery of Damaged Capabilities Without Retraining arXiv:2605.20296v1 Announce Type: new Abstract: Fine-tuning a language model for a target task routinely degrades capabilities the training data never explicitly threatened. We study this phenomenon, known as catastrophic forgetting, and propose a post-hoc repair solution that… 17 arXiv — Machine Learning research 1mo ago Spectral Souping: A Unified Framework for Online Preference Alignment arXiv:2605.20408v1 Announce Type: new Abstract: Reinforcement Learning from Human Feedback (RLHF) effectively aligns Large Language Models (LLMs) with aggregate human preferences but often fails to address the diverse and conflicting needs of individual users. To overcome this… 26 arXiv — Machine Learning research 1mo ago An exponential mechanism based on quadratic approximations for fine-tuning machine learning models with privacy guarantees arXiv:2605.20521v1 Announce Type: new Abstract: Fine-tuning adapts a pretrained machine learning model to a small, sensitive dataset, but this process risks memorizing individual new data points, making the model vulnerable to adversaries who seek to extract sensitive… 13 arXiv — Machine Learning research 1mo ago Modular Multimodal Classification Without Fine-Tuning: A Simple Compositional Approach arXiv:2605.20674v1 Announce Type: new Abstract: We introduce CoMET, \textit{\textbf{C}omposing \textbf{M}odality \textbf{E}ncoders with \textbf{T}abular foundation models}, a simple yet highly competitive method for multimodal classification: pass each modality through a frozen… 24 arXiv — NLP / Computation & Language research 1mo ago FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation arXiv:2605.20199v1 Announce Type: new Abstract: We present FlowLM, a flow matching language model transformed from pre-trained diffusion language models via efficient fine-tuning. By re-aligning the curved sampling trajectories of diffusion models into straight-line flows,… 4 arXiv — NLP / Computation & Language research 1mo ago Memory Grafting: Scaling Language Model Pre-training via Offline Conditional Memory arXiv:2605.20948v1 Announce Type: new Abstract: Scaling conditional memory offers a promising way to increase language-model capacity, but existing methods such as Engram learn large memory tables from scratch during pre-training, making memory scaling expensive and sometimes… 12 arXiv — NLP / Computation & Language research 1mo ago SymbolicLight V1: Spike-Gated Dual-Path Language Modeling with High Activation Sparsity and Sub-Billion-Scale Pre-Training Evidence arXiv:2605.21333v1 Announce Type: new Abstract: Natively trained spiking language models struggle to combine Transformer-like language quality, stable multi-domain pre-training, and high activation sparsity. We present SymbolicLight V1, a spike-gated dual-path language model… 7 arXiv — NLP / Computation & Language research 1mo ago SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning arXiv:2605.21147v1 Announce Type: cross Abstract: As the number of model parameters increases, parameter-efficient fine-tuning (PEFT) has become the go-to choice for tailoring pre-trained large language models. Low-rank Adaptation (LoRA) uses a low-rank update method to simulate… 24 Hugging Face Daily Papers research 1mo ago Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment Abstract Direct Preference Optimization (DPO) is theoretically equivalent to Reinforcement Learning from Human Feedback (RLHF) only under specific assumptions, otherwise optimizing different objectives; Constrained Preference Optimization (CPO) is proposed as a solution with… 17 Hugging Face Daily Papers research 1mo ago Zero-Shot Sim-to-Real Robot Learning: A Dexterous Manipulation Study on Reactive Catching Abstract Domain-Randomized Instance Set (DRIS) enables robust policy learning for dexterous manipulation tasks by simultaneously representing multiple randomized instances, achieving strong sim-to-real transfer without extensive real-world fine-tuning. AI-generated summary… 19 Hugging Face Daily Papers research 1mo ago Why Do Reasoning Models Lose Coverage? The Role of Data and Forks in the Road Abstract Reasoning models exhibit coverage shrinkage during supervised fine-tuning due to decision-point scenarios in training data, which can be mitigated through targeted data synthesis and diversity-encouraging decoding mechanisms. AI-generated summary Recent progress in… 30 r/LocalLLaMA community 1mo ago A streamlined Hugging Face model search utility coded by Qwen 3.6-27B Hi all. As some may have been aware, Hugging Face's model search had issues recently. (It seems to be resolved now though). I also often find myself struggling with the standard search interface when trying to find new derivative quants or finetunes of some particular models,… 24 Hugging Face Daily Papers research 1mo ago Overcoming Catastrophic Forgetting in Visual Continual Learning with Reinforcement Fine-Tuning Abstract Reinforcement Fine-Tuning suffers from catastrophic forgetting in visual continual learning, which is addressed through Retention-aware Policy Optimization that uses trajectory-level reward shaping and cross-task advantage normalization. AI-generated summary Recent… 11 arXiv — Machine Learning research 1mo ago HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models arXiv:2605.18795v1 Announce Type: new Abstract: Low-Rank Adaptation (LoRA) dominates parameter-efficient fine-tuning of large language models, yet most variants target dense architectures. Mixture-of-Experts (MoE) models scale parameters at near-constant per-token compute, and… 27 arXiv — Machine Learning research 1mo ago DynaTrain: Fast Online Parallelism Switching for Elastic LLM Training arXiv:2605.18815v1 Announce Type: new Abstract: Modern large language model (LLM) training is inherently dynamic: resource fluctuations, RLHF phase shifts, and cluster elasticity continually reshape the optimal parallelism layout, posing a significant challenge to existing… 22 arXiv — Machine Learning research 1mo ago Hybrid-LoRA: Bridging Full Fine-Tuning and Low-Rank Adaptation for Post-Training arXiv:2605.18822v1 Announce Type: new Abstract: Post-training has become essential for adapting large language models (LLMs) to complex downstream behaviors, including instruction following, preference alignment, and multi-step reasoning. Reinforcement learning with verifiable… 28 arXiv — Machine Learning research 1mo ago TEMPO: Temporal Enforcement via Mode-Separated Policy Optimization for Trustworthy LLM Backtesting arXiv:2605.18843v1 Announce Type: new Abstract: Backtesting large language models on historical events requires reasoning exclusively from information available before a specified cutoff date. Yet models routinely leak post-cutoff knowledge from pre-training into their… 37 arXiv — Machine Learning research 1mo ago HypergraphFormer: Learning Hypergraphs from LLMs for Editable Floor Plan Generation arXiv:2605.18932v1 Announce Type: new Abstract: In this work, we propose HypergraphFormer, a novel and efficient approach to floor plan generation based on learning hypergraph representations with a large language model (LLM). The model is trained via supervised fine-tuning to… 18 arXiv — Machine Learning research 1mo ago Distilling Linearized Behavior for Effective Task Arithmetic arXiv:2605.18993v1 Announce Type: new Abstract: Task vector composition has emerged as a promising paradigm for editing pre-trained models, enabling model merging through addition and unlearning through subtraction. Fine-tuning in the tangent space of a pre-trained model (linear… 20 arXiv — Machine Learning research 1mo ago LoRA vs. Full Fine-Tuning: A Theoretical Perspective arXiv:2605.19018v1 Announce Type: new Abstract: Fine-tuning adapts a pre-trained model to downstream tasks using a small amount of labeled data. Low-Rank Adaptation (LoRA) is an efficient fine-tuning method that reduces memory and computation costs while often achieving… 25 arXiv — Machine Learning research 1mo ago Learning When to Adapt arXiv:2605.19028v1 Announce Type: new Abstract: Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method, yet its learned correction is static: the same low-rank update is applied to every input. This input-agnostic approach creates an inevitable… 38 Page 7 of 9 · 450 articles ← Newer Older →