Tag

Training

450 articles archived under #training · RSS

arXiv — Machine Learning research 1mo ago

Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing

arXiv:2605.24052v1 Announce Type: new Abstract: To better serve users' demands in mobile applications (e.g., navigation), mobile crowdsourcing platforms can iteratively align large language model (LLM)-generated content (e.g., AI-generated traffic condition predictions) with…

10
arXiv — Machine Learning research 1mo ago

Signs Beat Floats: Low-Rank Double-Binary Adaptation for On-Device Fine-Tuning

arXiv:2605.24058v1 Announce Type: new Abstract: On-device adaptation of large language models commonly keeps a quantized base model frozen while training and deploying a small, task-specific LoRA adapter. In the unmerged adapter-mode setting, however, the adapter is more than a…

28
arXiv — Machine Learning research 1mo ago

Bilevel Optimization of Synthetic Trajectories for Multi-Turn LLM Fine-Tuning

arXiv:2605.24743v1 Announce Type: new Abstract: While LLMs excel at single-turn generation, they struggle with long-horizon, multi-turn interactions. Offline reinforcement learning (RL) offers a scalable approach, yet its performance hinges on the availability and quality of…

34
arXiv — NLP / Computation & Language research 1mo ago

Temporal Concept Drift in Legal Judgment Prediction: Neural Baselines Across Three Epochs of Ukrainian Court Decisions

arXiv:2605.24452v1 Announce Type: new Abstract: Legal NLP benchmarks evaluate models on randomly split data, implicitly assuming that legal language is stationary. We test this assumption by fine-tuning four transformer encoders -- XLM-RoBERTa (base and large) and their…

15
arXiv — NLP / Computation & Language research 1mo ago

Mix-MoE: Improving Multilingual Machine Translation of Large Language Models through Mixed MoEs

arXiv:2605.24681v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown great promise in multilingual machine translation (MT), even with limited bilingual supervision. However, fine-tuning LLMs with parallel corpora presents major challenges, namely parameter…

28
arXiv — NLP / Computation & Language research 1mo ago

NITP: Next Implicit Token Prediction for LLM Pre-training

arXiv:2605.24956v1 Announce Type: new Abstract: Standard next-token prediction (NTP) supervises language models solely through discrete labels in the output logit space. We argue that this sparse one-hot supervision leaves the latent representation space under-constrained,…

23
arXiv — Machine Learning research 1mo ago

FuRA: Full-Rank Parameter-Efficient Fine-Tuning with Spectral Preconditioning

arXiv:2605.22869v1 Announce Type: new Abstract: Both full fine-tuning (Full FT) and parameter-efficient fine-tuning methods such as LoRA introduce weight updates without accounting for the spectral structure established during pretraining. As a result, noisy gradients from…

36
arXiv — Machine Learning research 1mo ago

Understanding and Improving Noisy Embedding Techniques in Instruction Finetuning

arXiv:2605.23171v1 Announce Type: new Abstract: Recent advancements in instructional fine-tuning have injected noise into embeddings, with NEFTune (Jain et al., 2024) setting benchmarks using uniform noise. Despite NEFTune's empirical findings that uniform noise outperforms…

37
arXiv — Machine Learning research 1mo ago

RelPrism: A Multi-Faceted Pre-training Framework with Self-Generated Tasks for Relational Databases

arXiv:2605.23241v1 Announce Type: new Abstract: Relational databases (RDBs) remain the cornerstone of modern data systems and support diverse predictive tasks. Recent relational deep learning (RDL) methods enable end-to-end prediction by converting RDBs into graphs, where rows…

16
arXiv — Machine Learning research 1mo ago

Convex Optimization for Alignment and Preference Learning on a Single GPU

arXiv:2605.23244v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) to align with human preferences has driven the success of systems such as Gemini and ChatGPT. However, approaches like Reinforcement Learning from Human Feedback (RLHF) remain…

20
arXiv — Machine Learning research 1mo ago

Diffusion Domain Expansion: Learning to Coordinate Pre-trained Diffusion Models

arXiv:2605.23275v1 Announce Type: new Abstract: In this paper, we propose Diffusion Domain Expansion (DDE), a method that efficiently extends pre-trained diffusion models to generate larger objects and handle more complex conditioning beyond their original capabilities. Our…

27
arXiv — NLP / Computation & Language research 1mo ago

Learnability-Informed Fine-Tuning of Diffusion Language Models

arXiv:2605.22939v1 Announce Type: new Abstract: We aim to improve the reasoning capabilities of diffusion language models (DLMs). While SFT is a popular post-training recipe for autoregressive models, its use in DLMs faces challenges and can even hurt performance, though the…

20
arXiv — NLP / Computation & Language research 1mo ago

Structure-Guided Entity Resolution: Fine-Tuning LLMs for Robust Name Matching in Complex Linguistic Contexts

arXiv:2605.23597v1 Announce Type: new Abstract: Matching person names across heterogeneous records is a core challenge in entity resolution, especially within linguistically and culturally complex environments. Variations in naming conventions, inconsistent transliteration…

24
arXiv — NLP / Computation & Language research 1mo ago

Is a Document Educational or Just Wikipedia-Style? -- Pitfalls of Classifier-Based Quality Filtering

arXiv:2605.23721v1 Announce Type: new Abstract: Classifier-based Quality Filtering has recently emerged as a fundamental technique in constructing pre-training corpora. The ability to deploy a single model that can replace or supplement a set of heuristics has proven effective…

35
arXiv — NLP / Computation & Language research 1mo ago

Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum

arXiv:2510.00526v3 Announce Type: replace Abstract: Supervised fine-tuning (SFT) is the standard approach for post-training large language models (LLMs), yet it often shows limited generalization. We trace this limitation to its default training objective: negative log…

13
arXiv — NLP / Computation & Language research 1mo ago

Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches

arXiv:2512.12677v2 Announce Type: replace Abstract: We explore efficient strategies to fine-tune decoder-only Large Language Models (LLMs) for downstream text classification under resource constraints. Two approaches are investigated: (1) attaching a classification head to a…

4
r/LocalLLaMA community 1mo ago

llama.cpp has a clever trick for speeding up KV cache decode

So, I use llama-server as my endpoint to run local models and connect them to Open-WebUI, Hermes, and OpenCode. But since llama.cpp's webUI has been receiving a lot of updates, I took a look at its settings and noticed a particular one under developer options. This is the…

23
r/LocalLLaMA community 1mo ago

G4-MeroMero-26B-A4B-it-uncensored-heretic Is Out Now, a Finetune of gemma-4-26B-A4B-it, With KLD of 0.0152 and 12/100 Refusals!

When I previously posted the uncensored version of the 31B version of the MeroMero finetune, quite a few people asked for the 26B-A4B version, I wasn't so keen on it because I considered the 31B to be the better version, but I understand that people might want the 26B-A4B…

20
Hugging Face Daily Papers research 1mo ago

Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators

Abstract Audio diffusion models are adapted for interactive music generation through efficient block-wise processing and novel training paradigms that enable real-time performance on consumer hardware. AI-generated summary Interactive streaming music generation promises the use…

11
r/LocalLLaMA community 1mo ago

Low-level coding dataset

Hi all, I've recently been thinking about putting together a community sourced coding dataset for finetuning models, with a heavy focus on cpp and systems programming. My goal is to eventually have a model (say a finetune of Qwen3.6-27b) that is good at stuff like memory…

15
arXiv — Machine Learning research 1mo ago

From Parameters to Data: A Task-Parameter-Guided Fine-Tuning Pipeline for Efficient LLM Alignment

arXiv:2605.21558v1 Announce Type: new Abstract: Adapting Large Language Models (LLMs) to specialized domains typically incurs high data and computational overhead. While prior efficiency efforts have largely treated data selection and parameter-efficient fine-tuning as isolated…

38
arXiv — NLP / Computation & Language research 1mo ago

Token-weighted Direct Preference Optimization with Attention

arXiv:2605.21883v1 Announce Type: new Abstract: Direct Preference Optimization (DPO) aligns Large Language Models with human preferences without the need for a separate reward model. However, DPO treats all tokens in responses equally, neglecting the differing importance of…

5
arXiv — NLP / Computation & Language research 1mo ago

Modeling Pathology-Like Behavioral Patterns in Language Models Through Behavioral Fine-Tuning

arXiv:2605.22356v1 Announce Type: new Abstract: Large language models are increasingly used as computational tools for modeling human-like behavior. We introduce a behavioral induction framework that modifies model policies through fine-tuning on structured decision-making…

19
arXiv — NLP / Computation & Language research 1mo ago

Beyond Temperature: Hyperfitting as a Late-Stage Geometric Expansion

arXiv:2605.22579v1 Announce Type: new Abstract: Recent work has identified a counterintuitive phenomenon termed "Hyperfitting", where fine-tuning Large Language Models (LLMs) to near-zero training loss on small datasets surprisingly enhances open-ended generation quality and…

16
arXiv — NLP / Computation & Language research 1mo ago

Understanding Data Temporality Impact on Large Language Models Pre-training

arXiv:2605.22769v1 Announce Type: new Abstract: Large language models (LLMs) are typically trained on shuffled corpora, yielding models whose knowledge is frozen at train time and whose temporal grounding remains poorly understood. In this work, we study the impact of…

14
llama.cpp releases dev-tools 1mo ago

b9276

server: expose prompt token counts in /slots endpoint ( #23454 ) Add n_prompt_tokens, n_prompt_tokens_processed, and n_prompt_tokens_cache to the /slots JSON response. These fields are already tracked internally but were not exposed, making it impossible for clients to monitor…

15
r/LocalLLaMA community 1mo ago

LatitudeGames/Equinox-31B · Hugging Face

new model from LatitudeGames - Gemma 31B finetune https://huggingface.co/LatitudeGames/Equinox-31B-GGUF Equinox draws its name from the balance between extremes. Trained on a balanced blend of Wayfarer 2 's unforgiving dark adventures and Hearthfire 's quiet slice-of-life…

14
r/LocalLLaMA community 1mo ago

I'm running an agentic system with kobold.cpp as my backend. Am I losing performance?

Currently, I'm running a Hermes agent with an OpenAI v1 compatible endpoint provided by Kobold. My setup is a a 24GB 3090Ti + 512GB DDR4 running Qwen3.6-35B-A3B. I plan to move to a larger MoE model once I'm satisfied with how everything is working, but I'm just wondering if I'm…

33
Hugging Face Daily Papers research 1mo ago

Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining

Abstract A large-scale GUI dataset was created by automatically extracting interaction trajectories from internet videos, enabling improved performance in GUI agents through pre-training on this diverse collection. AI-generated summary Recent advances in multimodal large…

35
arXiv — Machine Learning research 1mo ago

Spectral Unforgetting: Post-Hoc Recovery of Damaged Capabilities Without Retraining

arXiv:2605.20296v1 Announce Type: new Abstract: Fine-tuning a language model for a target task routinely degrades capabilities the training data never explicitly threatened. We study this phenomenon, known as catastrophic forgetting, and propose a post-hoc repair solution that…

17
arXiv — Machine Learning research 1mo ago

Spectral Souping: A Unified Framework for Online Preference Alignment

arXiv:2605.20408v1 Announce Type: new Abstract: Reinforcement Learning from Human Feedback (RLHF) effectively aligns Large Language Models (LLMs) with aggregate human preferences but often fails to address the diverse and conflicting needs of individual users. To overcome this…

26
arXiv — Machine Learning research 1mo ago

An exponential mechanism based on quadratic approximations for fine-tuning machine learning models with privacy guarantees

arXiv:2605.20521v1 Announce Type: new Abstract: Fine-tuning adapts a pretrained machine learning model to a small, sensitive dataset, but this process risks memorizing individual new data points, making the model vulnerable to adversaries who seek to extract sensitive…

13
arXiv — Machine Learning research 1mo ago

Modular Multimodal Classification Without Fine-Tuning: A Simple Compositional Approach

arXiv:2605.20674v1 Announce Type: new Abstract: We introduce CoMET, \textit{\textbf{C}omposing \textbf{M}odality \textbf{E}ncoders with \textbf{T}abular foundation models}, a simple yet highly competitive method for multimodal classification: pass each modality through a frozen…

24
arXiv — NLP / Computation & Language research 1mo ago

FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation

arXiv:2605.20199v1 Announce Type: new Abstract: We present FlowLM, a flow matching language model transformed from pre-trained diffusion language models via efficient fine-tuning. By re-aligning the curved sampling trajectories of diffusion models into straight-line flows,…

4
arXiv — NLP / Computation & Language research 1mo ago

Memory Grafting: Scaling Language Model Pre-training via Offline Conditional Memory

arXiv:2605.20948v1 Announce Type: new Abstract: Scaling conditional memory offers a promising way to increase language-model capacity, but existing methods such as Engram learn large memory tables from scratch during pre-training, making memory scaling expensive and sometimes…

12
arXiv — NLP / Computation & Language research 1mo ago

SymbolicLight V1: Spike-Gated Dual-Path Language Modeling with High Activation Sparsity and Sub-Billion-Scale Pre-Training Evidence

arXiv:2605.21333v1 Announce Type: new Abstract: Natively trained spiking language models struggle to combine Transformer-like language quality, stable multi-domain pre-training, and high activation sparsity. We present SymbolicLight V1, a spike-gated dual-path language model…

7
arXiv — NLP / Computation & Language research 1mo ago

SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning

arXiv:2605.21147v1 Announce Type: cross Abstract: As the number of model parameters increases, parameter-efficient fine-tuning (PEFT) has become the go-to choice for tailoring pre-trained large language models. Low-rank Adaptation (LoRA) uses a low-rank update method to simulate…

24
Hugging Face Daily Papers research 1mo ago

Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment

Abstract Direct Preference Optimization (DPO) is theoretically equivalent to Reinforcement Learning from Human Feedback (RLHF) only under specific assumptions, otherwise optimizing different objectives; Constrained Preference Optimization (CPO) is proposed as a solution with…

17
Hugging Face Daily Papers research 1mo ago

Zero-Shot Sim-to-Real Robot Learning: A Dexterous Manipulation Study on Reactive Catching

Abstract Domain-Randomized Instance Set (DRIS) enables robust policy learning for dexterous manipulation tasks by simultaneously representing multiple randomized instances, achieving strong sim-to-real transfer without extensive real-world fine-tuning. AI-generated summary…

19
Hugging Face Daily Papers research 1mo ago

Why Do Reasoning Models Lose Coverage? The Role of Data and Forks in the Road

Abstract Reasoning models exhibit coverage shrinkage during supervised fine-tuning due to decision-point scenarios in training data, which can be mitigated through targeted data synthesis and diversity-encouraging decoding mechanisms. AI-generated summary Recent progress in…

30
r/LocalLLaMA community 1mo ago

A streamlined Hugging Face model search utility coded by Qwen 3.6-27B

Hi all. As some may have been aware, Hugging Face's model search had issues recently. (It seems to be resolved now though). I also often find myself struggling with the standard search interface when trying to find new derivative quants or finetunes of some particular models,…

24
Hugging Face Daily Papers research 1mo ago

Overcoming Catastrophic Forgetting in Visual Continual Learning with Reinforcement Fine-Tuning

Abstract Reinforcement Fine-Tuning suffers from catastrophic forgetting in visual continual learning, which is addressed through Retention-aware Policy Optimization that uses trajectory-level reward shaping and cross-task advantage normalization. AI-generated summary Recent…

11
arXiv — Machine Learning research 1mo ago

HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models

arXiv:2605.18795v1 Announce Type: new Abstract: Low-Rank Adaptation (LoRA) dominates parameter-efficient fine-tuning of large language models, yet most variants target dense architectures. Mixture-of-Experts (MoE) models scale parameters at near-constant per-token compute, and…

27
arXiv — Machine Learning research 1mo ago

DynaTrain: Fast Online Parallelism Switching for Elastic LLM Training

arXiv:2605.18815v1 Announce Type: new Abstract: Modern large language model (LLM) training is inherently dynamic: resource fluctuations, RLHF phase shifts, and cluster elasticity continually reshape the optimal parallelism layout, posing a significant challenge to existing…

22
arXiv — Machine Learning research 1mo ago

Hybrid-LoRA: Bridging Full Fine-Tuning and Low-Rank Adaptation for Post-Training

arXiv:2605.18822v1 Announce Type: new Abstract: Post-training has become essential for adapting large language models (LLMs) to complex downstream behaviors, including instruction following, preference alignment, and multi-step reasoning. Reinforcement learning with verifiable…

28
arXiv — Machine Learning research 1mo ago

TEMPO: Temporal Enforcement via Mode-Separated Policy Optimization for Trustworthy LLM Backtesting

arXiv:2605.18843v1 Announce Type: new Abstract: Backtesting large language models on historical events requires reasoning exclusively from information available before a specified cutoff date. Yet models routinely leak post-cutoff knowledge from pre-training into their…

37
arXiv — Machine Learning research 1mo ago

HypergraphFormer: Learning Hypergraphs from LLMs for Editable Floor Plan Generation

arXiv:2605.18932v1 Announce Type: new Abstract: In this work, we propose HypergraphFormer, a novel and efficient approach to floor plan generation based on learning hypergraph representations with a large language model (LLM). The model is trained via supervised fine-tuning to…

18
arXiv — Machine Learning research 1mo ago

Distilling Linearized Behavior for Effective Task Arithmetic

arXiv:2605.18993v1 Announce Type: new Abstract: Task vector composition has emerged as a promising paradigm for editing pre-trained models, enabling model merging through addition and unlearning through subtraction. Fine-tuning in the tangent space of a pre-trained model (linear…

20
arXiv — Machine Learning research 1mo ago

LoRA vs. Full Fine-Tuning: A Theoretical Perspective

arXiv:2605.19018v1 Announce Type: new Abstract: Fine-tuning adapts a pre-trained model to downstream tasks using a small amount of labeled data. Low-Rank Adaptation (LoRA) is an efficient fine-tuning method that reduces memory and computation costs while often achieving…

25
arXiv — Machine Learning research 1mo ago

Learning When to Adapt

arXiv:2605.19028v1 Announce Type: new Abstract: Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method, yet its learned correction is static: the same low-rank update is applied to every input. This input-agnostic approach creates an inevitable…

38

Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing

Signs Beat Floats: Low-Rank Double-Binary Adaptation for On-Device Fine-Tuning

Bilevel Optimization of Synthetic Trajectories for Multi-Turn LLM Fine-Tuning

Temporal Concept Drift in Legal Judgment Prediction: Neural Baselines Across Three Epochs of Ukrainian Court Decisions

Mix-MoE: Improving Multilingual Machine Translation of Large Language Models through Mixed MoEs

NITP: Next Implicit Token Prediction for LLM Pre-training

FuRA: Full-Rank Parameter-Efficient Fine-Tuning with Spectral Preconditioning

Understanding and Improving Noisy Embedding Techniques in Instruction Finetuning

RelPrism: A Multi-Faceted Pre-training Framework with Self-Generated Tasks for Relational Databases

Convex Optimization for Alignment and Preference Learning on a Single GPU

Diffusion Domain Expansion: Learning to Coordinate Pre-trained Diffusion Models

Learnability-Informed Fine-Tuning of Diffusion Language Models

Structure-Guided Entity Resolution: Fine-Tuning LLMs for Robust Name Matching in Complex Linguistic Contexts

Is a Document Educational or Just Wikipedia-Style? -- Pitfalls of Classifier-Based Quality Filtering

Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum

Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches

llama.cpp has a clever trick for speeding up KV cache decode

G4-MeroMero-26B-A4B-it-uncensored-heretic Is Out Now, a Finetune of gemma-4-26B-A4B-it, With KLD of 0.0152 and 12/100 Refusals!

Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators

Low-level coding dataset

From Parameters to Data: A Task-Parameter-Guided Fine-Tuning Pipeline for Efficient LLM Alignment

Token-weighted Direct Preference Optimization with Attention

Modeling Pathology-Like Behavioral Patterns in Language Models Through Behavioral Fine-Tuning

Beyond Temperature: Hyperfitting as a Late-Stage Geometric Expansion

Understanding Data Temporality Impact on Large Language Models Pre-training

b9276

LatitudeGames/Equinox-31B · Hugging Face

I'm running an agentic system with kobold.cpp as my backend. Am I losing performance?

Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining

Spectral Unforgetting: Post-Hoc Recovery of Damaged Capabilities Without Retraining

Spectral Souping: A Unified Framework for Online Preference Alignment

An exponential mechanism based on quadratic approximations for fine-tuning machine learning models with privacy guarantees

Modular Multimodal Classification Without Fine-Tuning: A Simple Compositional Approach

FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation

Memory Grafting: Scaling Language Model Pre-training via Offline Conditional Memory

SymbolicLight V1: Spike-Gated Dual-Path Language Modeling with High Activation Sparsity and Sub-Billion-Scale Pre-Training Evidence

SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning

Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment

Zero-Shot Sim-to-Real Robot Learning: A Dexterous Manipulation Study on Reactive Catching

Why Do Reasoning Models Lose Coverage? The Role of Data and Forks in the Road

A streamlined Hugging Face model search utility coded by Qwen 3.6-27B

Overcoming Catastrophic Forgetting in Visual Continual Learning with Reinforcement Fine-Tuning

HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models

DynaTrain: Fast Online Parallelism Switching for Elastic LLM Training

Hybrid-LoRA: Bridging Full Fine-Tuning and Low-Rank Adaptation for Post-Training

TEMPO: Temporal Enforcement via Mode-Separated Policy Optimization for Trustworthy LLM Backtesting

HypergraphFormer: Learning Hypergraphs from LLMs for Editable Floor Plan Generation

Distilling Linearized Behavior for Effective Task Arithmetic

LoRA vs. Full Fine-Tuning: A Theoretical Perspective

Learning When to Adapt