Tag

Training

450 articles archived under #training · RSS

r/LocalLLaMA community 4d ago

A Blind Visual Paradigm for Testing Skill Transfer in Small Models Without Fine-Tuning

TL;DR: Small models aren't dumb, they're shallow. I designed a cross-domain, blind, visual experiment to see if a large model can compress its "planning discipline" into a reusable scaffold that makes a small model deeper — with zero fine-tuning. Three.js is the testbed because…

28
r/LocalLLaMA community 4d ago

I built a tool to turn your Claude Code sessions into fine-tuning data for local models

If you use Claude Code, every session is already sitting on disk as a .jsonl file under ~/.claude/projects/ . It has real coding conversations: multi-turn edits, tool calls, reasoning traces. That's training data you already generated for free. The problem is the format is not…

36
r/LocalLLaMA community 4d ago

Anyone still doing fine-tunes on consumer grade hardware?

Felt like there used to be a thriving fine-tuning community a few years back - and then once we started getting models that were smart enough and generalist enough (i.e. post Llama-3-8b era) things kind of dropped off a little. Less need for fine-tunes when prompt-tweaking can…

22
r/LocalLLaMA community 5d ago

Are there any qwen finetunes that were genuinely stronger than the base?

It's pretty popular to finetune qwen models but I never hear anyone say anything positive about them.   submitted by   /u/MrMrsPotts [link]   [comments]

30
Hugging Face Daily Papers research 6d ago

How Post-Training Shapes Biological Reasoning Models

Abstract Post-training stages in biological reasoning models differently affect generalization, with continued pre-training aligning models with biological language, supervised fine-tuning improving in-domain performance but reducing out-of-domain generalization, and…

8
arXiv — Machine Learning research 6d ago

SSM Adapters via Hankel Reduced-order Modeling: Injection Site Determines Task Suitability in Long-Context Fine-Tuning

arXiv:2606.26290v1 Announce Type: new Abstract: While parameter-efficient fine-tuning (PEFT) typically targets attention projectors, its efficacy for tasks requiring sequential state accumulation remains under-explored. We examine if PEFT for such tasks can benefit from state…

18
arXiv — Machine Learning research 6d ago

At the Edge of Understanding: Sparse Autoencoders Trace The Limits of Transformer Generalization

arXiv:2606.26396v1 Announce Type: new Abstract: Pre-trained transformers have demonstrated remarkable generalization abilities, at times extending beyond the scope of their training data. Yet, real-world deployments often face unexpected or adversarial data that diverges from…

34
arXiv — Machine Learning research 6d ago

Localizing RL-Induced Tool Use to a Single Crosscoder Feature

arXiv:2606.26474v1 Announce Type: new Abstract: Fine-tuning through RL reshapes the internal representations of language models to enable agentic behaviors such as tool use, yet the mechanistic basis of these changes remains poorly understood. While RL substantially improves…

4
arXiv — Machine Learning research 6d ago

Reasoning Quality Emerges Early: Data Curation for Reasoning Models

arXiv:2606.26797v1 Announce Type: new Abstract: Supervised fine-tuning (SFT) on a small, high-quality set of long reasoning traces is an effective approach for eliciting strong reasoning capabilities in Large Language Models (LLMs). However, existing methods for curating…

14
arXiv — Machine Learning research 6d ago

Designing Reward Signals for Portable Query Generation: A Case Study in Industrial Semantic Job Search

arXiv:2606.27291v1 Announce Type: new Abstract: Job-search platforms rely on low-bandwidth query interfaces that often fail to capture the high-dimensional complexity of candidate profiles. We present an end-to-end RLAIF (Reinforcement Learning from AI Feedback) framework to…

10
arXiv — NLP / Computation & Language research 6d ago

Helpfulness Hurts: Domain-Dependent Degradation of Mid-Trained Compassion Values Under Post-Training

arXiv:2606.26102v1 Announce Type: new Abstract: Standard post-training pipelines apply supervised fine-tuning (SFT) and reinforcement learning (RL) to make language models helpful, but these processes may inadvertently degrade values instilled during pre-training. We investigate…

22
arXiv — NLP / Computation & Language research 6d ago

Closing the Quality Gap in Low-Resource Text-to-Speech: LoRA Fine-Tuning of VoxCPM2 for Khmer and Korean

arXiv:2606.26618v1 Announce Type: new Abstract: Large pretrained text-to-speech (TTS) models sound almost human for well-resourced languages, but much worse for languages that are rare in their training data. We study this quality gap for Khmer and Korean using VoxCPM2, a…

26
arXiv — NLP / Computation & Language research 6d ago

Improving General Role-Playing Agents via Psychology-Grounded Reasoning and Role-Aware Policy Optimization

arXiv:2606.27025v1 Announce Type: new Abstract: Building general-purpose role-playing agents that faithfully portray any character from a natural-language profile remains challenging. The dominant paradigm -- supervised fine-tuning -- encourages behavioral mimicry without deep,…

16
r/LocalLLaMA community 6d ago

When you don't have a data center GPU

Please don't tell me someone is going to (yet again) reply with the longest finetune-merge name in eternity...   submitted by   /u/Iwaku_Real [link]   [comments]

4
Hugging Face official-blog 6d ago

Run a vLLM Server on HF Jobs in One Command

Back to Articles a]:hidden"> Run a vLLM Server on HF Jobs in One Command Published June 26, 2026 Update on GitHub Upvote - Quentin Gallouédec qgallouedec You can spin up a private, OpenAI-compatible LLM endpoint on Hugging Face infrastructure with a single command — no servers…

18
r/LocalLLaMA community 6d ago

Qwen 3.6 27b GLM 5.2 fine-tune?

Hi everyone, Since both models are open weights and GLM seems to find that secret to frontier model reasoning, why don't we see any Qwen GLM finetune yet? Is it because GLM 5.2 is recent and finetune and datasets take time or the community is just not interested in the finetune?…

28
r/LocalLLaMA community 6d ago

DGX Spark OS lifetime?

I think of purchasing 2 DGX Sparks for my office (because a 700+W workstation would be intolerable) for LLM-centric work (inference only, no fine-tuning). I know the OS is based on Ubuntu 24.04. Has Nvidia ever disclosed what is the lifetime of the OS? Meaning, is there a chance…

17
r/MachineLearning community 6d ago

[R] Compiling Agentic Workflows into LLM Weights: Near-Frontier Quality at Two Orders of Magnitude Less Cost

Token-based billing is causing my company to reevaluate small language models. I came across this paper that shows SLM supervised fine-tuning on traces from orchestration of frontier models can be nearly as performant and much cheaper. Has any tried this in the real world?  …

34
arXiv — Machine Learning research 7d ago

Retrieval-Augmented Personalization with Foundation Models for Wearable Stress Detection

arXiv:2606.24985v1 Announce Type: new Abstract: Personalization in wearable-based stress detection remains challenging due to substantial inter-individual variability in physiological and behavioral responses. While traditional approaches rely on user-specific fine-tuning or…

5
arXiv — Machine Learning research 7d ago

The Geometry of Sequential Learning: Lie-Bracket Prediction of Transfer Order

arXiv:2606.24993v1 Announce Type: new Abstract: Sequential learning is order-dependent: from Pile-style next-token domain adaptation to instruction-SFT and DPO, N candidate sources induce N! possible curricula. We show that the local order effect is governed by a computable…

7
arXiv — NLP / Computation & Language research 7d ago

Neural Scaling Universality: If Exponents Are Fixed, Time to Understand Coefficients

arXiv:2606.25008v1 Announce Type: cross Abstract: Neural scaling laws describe how pre-training loss decays as power laws with training time, model size, and compute. This position paper argues that the exponents of these power laws are fixed by generic mechanisms: a one-third…

13
arXiv — NLP / Computation & Language research 7d ago

Detect, Unlearn, Restore: Defending Text Summarization Models Against Data Poisoning

arXiv:2606.26036v1 Announce Type: new Abstract: Training-time data poisoning during fine-tuning poses a significant threat to large language models (LLMs) deployed for abstractive text summarization, where small task-specific datasets exert disproportionate influence on model…

26
arXiv — NLP / Computation & Language research 7d ago

Does Translation-Enhanced Speech Encoder Pre-training Affect Speech LLMs?

arXiv:2606.25444v1 Announce Type: cross Abstract: Connecting a pre-trained speech encoder to a Large Language Model (LLM) is the standard architecture for building Speech LLMs. However, a structural misalignment exists between the encoder and the LLM. Unlike encoders based on…

23
arXiv — NLP / Computation & Language research 7d ago

Scale or Reason? A Compute-Equivalent Analysis of Reasoning Distillation

arXiv:2509.22193v2 Announce Type: replace Abstract: Distilling reasoning traces from strong teacher models has become the standard recipe for building capable small language models. Yet reasoning traces are 5-20$\times$ longer than standard instruction fine-tuning (IFT) outputs,…

19
r/LocalLLaMA community 7d ago

Gemma4-26B-A4B & 31B-QAT Uncensored Balanced are out with MTP (35% & 53% speed boost)!

First of all, I'm stoked to announce we are almost at 20 million downloads on HF! (counted only on my own account, no duplicates/quants/finetunes/etc) and almost 5000 members on Discord! Two releases this time, as promised, the bigger Gemma 4 QATs, both Balanced, both with MTP :…

6
r/MachineLearning community 7d ago

I made a superhuman Generals.io agent with self-play RL [P]

Hi everyone, I trained a self-play RL agent for Generals.io that reached superhuman-level and ranked #1 on the human 1v1 leaderboard. It began as my master's thesis where the goal was to beat a prior algorithm based agent. We succeeded using behavior cloning, RL fine-tuning and…

6
Hugging Face official-blog 7d ago

Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel

Back to Articles a]:hidden"> Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel Enterprise + Article Published June 24, 2026 Upvote - Adil Asif adil-asif nvidia Alexandros Koumparoulis akoumpa nvidia Wenwen Gao wgao2021 nvidia Sylendran Arunagiri Sylendran95 nvidia…

29
arXiv — Machine Learning research 8d ago

Weight-Space Geometry of Offline Reasoning Training

arXiv:2606.23740v1 Announce Type: new Abstract: Offline reinforcement-learning losses (RFT, RIFT, DFT, Offline GRPO, DPO) are widely used to distill reasoning from large teachers into smaller students, and are typically compared on downstream accuracy alone. We ask whether they…

6
arXiv — NLP / Computation & Language research 8d ago

When Top-1 Fails: Calibrating LoRA Monitors for Masked Diffusion LMs

arXiv:2606.24119v1 Announce Type: cross Abstract: Discrete diffusion language model (DLM) fine-tuning inherits inexpensive diagnostics from denoising-time confidence monitors, but their PEFT-training meaning is untested. We test top-1 argmax concentration as a collapse warning.…

12
arXiv — NLP / Computation & Language research 8d ago

Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning

arXiv:2606.24133v1 Announce Type: cross Abstract: The composition of training data, governed by the diversity of sources and their mixing strategy, is a cornerstone of Large Language Model (LLM) pre-training. Online Data Mixing (ODM), the technique of adaptively adjusting data…

13
arXiv — NLP / Computation & Language research 8d ago

Matching Tasks to Objectives: Fine-Tuning and Prompt-Tuning Strategies for Encoder-Decoder Pre-trained Language Models

arXiv:2606.24841v1 Announce Type: cross Abstract: Prompt-based learning has emerged as a dominant paradigm in natural language processing. This study explores the impact of diverse pre-training objectives on the performance of encoder-decoder pre-trained language models across…

18
Hugging Face Daily Papers research 8d ago

Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning

Abstract A novel online data mixing framework called Holistic Data Scheduler uses reinforcement learning with a multi-objective reward function to optimize large language model pre-training efficiency and performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The composition…

38
r/LocalLLaMA community 9d ago

Training a Qwen 3.5 4B/9B agent for multi-tool use: SFT first or go directly to RL?

To train Qwen 3.5 4B or 9B for a custom multi-tool agent workflow and would appreciate guidance from people who have done this successfully. A few questions: SFT → RL or RL-only? - Is it still recommended to first do supervised fine-tuning (tool-calling traces, reasoning…

15
r/LocalLLaMA community 9d ago

Is Gemma 4 going to be the next Mistral (or Qwen3.6) one day? Concerning the lack of finetunes

https://eqbench.com/creative_writing.html#:~:text=gemma%2D4%2D31B,Sample From what I've seen Gemma 4 has better everything (especially long-context adherence) EXCEPT for the raw prosing performance of Mistral... finetunes . Comparing bases only, Mistral Small 3.2 (the…

5
Hugging Face Daily Papers research 12d ago

No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages

Abstract Research addresses code generation challenges for no-resource programming languages by developing benchmarks and proposing a method that combines further pre-training with weight difference transfer to create specialized instruction-following models at reduced…

27
Smol AI News news-outlet 13d ago

not much happened today

**GLM-5.2** emerges as a leading open-weight coding model rivaling **Opus 4.8** and **GPT-5.5** in software engineering tasks, emphasizing the strategic importance of open models for provider competition, on-prem deployment, and fine-tuning rights. Experts like **Patrick…

17
arXiv — Machine Learning research 13d ago

Spectral DPPs via NEPv: A Scalable Continuous Relaxation of Determinantal MAP for Diversity-Aware Data Selection

arXiv:2606.19411v1 Announce Type: new Abstract: Selecting a small, diverse, high-quality subset from a massive pool of candidates is a recurring primitive in modern machine learning -- data curation and coreset selection for training and fine-tuning large models, active-learning…

32
arXiv — Machine Learning research 13d ago

Techniques for Peak Memory Reduction for LoRA Fine-tuning of LLMs on Edge Devices

arXiv:2606.19528v1 Announce Type: new Abstract: Fine-tuning of Large Language Models (LLMs) using Low-Rank Adaptation (LoRA) on an end-user's data offers personalized experiences while keeping data private, but faces severe memory constraints on consumer hardware. Peak memory…

15
arXiv — Machine Learning research 13d ago

Tracking Representation Dynamics in Large Language Models with Persistent Homology

arXiv:2606.19542v1 Announce Type: new Abstract: Large language models are commonly aligned through supervised fine-tuning, yet little is known about how their internal representations evolve during this process. We study alignment dynamics using persistent homology by tracking…

38
arXiv — Machine Learning research 13d ago

Predicting Mergeability of Parameter-Efficient Fine-Tuning Updates

arXiv:2606.19549v1 Announce Type: new Abstract: Low-rank adaptation (LoRA) makes it cheap to train many domain- and task-specific language model adapters, but whether two adapters can be merged is usually discovered only after both have been fully trained and evaluated. This…

7
arXiv — Machine Learning research 13d ago

Uncertainty-Aware Reward Modeling for Stable RLHF

arXiv:2606.19818v1 Announce Type: new Abstract: Reinforcement learning from human feedback (RLHF) aligns large language models by training reward models on preference data and optimizing policies to maximize predicted rewards. However, this pipeline faces two fundamental…

4
arXiv — Machine Learning research 13d ago

Multi-Modal Contrastive Learning for Implicit Earth Embeddings via Location Tying

arXiv:2606.20167v1 Announce Type: new Abstract: Spatial prediction tasks are often limited by a lack of high-quality labelled ground-truth observations. To overcome this challenge, self-supervised pre-training is a possible solution, with contrastive learning dominant for…

6
arXiv — NLP / Computation & Language research 13d ago

Disentangling Linguistic Relatedness from Task Alignment in Cross-Lingual Transfer

arXiv:2606.19346v1 Announce Type: new Abstract: We study cross-lingual transfer by fine-tuning seven large language models (4B--671B parameters) on Arabic and evaluating zero-shot reading comprehension on Semitic languages and non-Semitic controls. Across dense and…

6
arXiv — NLP / Computation & Language research 13d ago

Clusters are All You Need: Pre-Training the Tsetlin Machine with Semantic Clusters from Language Models for Interpretability

arXiv:2606.19815v1 Announce Type: new Abstract: Pre-trained language models such as BERT achieve strong text classification performance but lack transparency, limiting their use in high-stakes settings. The Tsetlin Machine (TM) offers fully interpretable, clause-based reasoning…

25
arXiv — NLP / Computation & Language research 13d ago

Actionable Activation Directions for Detecting and Mitigating Emergent Misalignment Across Language Model Families

arXiv:2606.20225v1 Announce Type: new Abstract: Fine-tuning language models on insecure code induces emergent misalignment with poorly understood internal structure. We investigate whether this misalignment corresponds to a causally actionable activation-space direction shared…

31
arXiv — NLP / Computation & Language research 13d ago

MENTOR: Reinforcement Learning via Flexible Teacher-Optimized Rewards for Tool-Use Distillation

arXiv:2510.18383v3 Announce Type: replace Abstract: Distilling the tool-use capabilities of large language models (LLMs) into small language models (SLMs) is essential for their practical application. The predominant approach, supervised fine-tuning (SFT), suffers from poor…

20
arXiv — NLP / Computation & Language research 13d ago

Improving Alignment Between Human and Machine Codes: An Empirical Assessment of Prompt Engineering for Construct Identification in Psychology

arXiv:2512.03818v2 Announce Type: replace Abstract: Due to their architecture and vast pre-training data, large language models (LLMs) demonstrate strong text classification performance. However, LLM output - here, the category assigned to a text - depends heavily on the wording…

33
llama.cpp releases dev-tools 13d ago

b9714

server: add "X-Accel-Buffering": "no" header to streaming endpoints ( #24774 ) server: add "X-Accel-Buffering": "no" header to streaming endpoints This header tells Nginx (as a reverse proxy) to NOT buffer responses. (only affects streaming endpoints) Without it, Nginx will…

11
arXiv — Machine Learning research 14d ago

CODEBLOCK: Learning to Supervise Code at the Right Granularity

arXiv:2606.18286v1 Announce Type: new Abstract: Supervised fine-tuning of code LLMs typically applies uniform cross-entropy loss to all response tokens, implicitly assuming that every token provides equally useful learning signal. Recent token-level selection methods challenge…

34
arXiv — Machine Learning research 14d ago

DRIFT: Refining Instruction Data via On-Policy Data Attribution

arXiv:2606.18307v1 Announce Type: new Abstract: Optimizing the training data distribution for Supervised Fine-Tuning (SFT) dictates the capability of Large Language Models (LLMs). While existing data curation methods excel at accelerating training under constrained budgets, they…

23

A Blind Visual Paradigm for Testing Skill Transfer in Small Models Without Fine-Tuning

I built a tool to turn your Claude Code sessions into fine-tuning data for local models

Anyone still doing fine-tunes on consumer grade hardware?

Are there any qwen finetunes that were genuinely stronger than the base?

How Post-Training Shapes Biological Reasoning Models

SSM Adapters via Hankel Reduced-order Modeling: Injection Site Determines Task Suitability in Long-Context Fine-Tuning

At the Edge of Understanding: Sparse Autoencoders Trace The Limits of Transformer Generalization

Localizing RL-Induced Tool Use to a Single Crosscoder Feature

Reasoning Quality Emerges Early: Data Curation for Reasoning Models

Designing Reward Signals for Portable Query Generation: A Case Study in Industrial Semantic Job Search

Helpfulness Hurts: Domain-Dependent Degradation of Mid-Trained Compassion Values Under Post-Training

Closing the Quality Gap in Low-Resource Text-to-Speech: LoRA Fine-Tuning of VoxCPM2 for Khmer and Korean

Improving General Role-Playing Agents via Psychology-Grounded Reasoning and Role-Aware Policy Optimization

When you don't have a data center GPU

Run a vLLM Server on HF Jobs in One Command

Qwen 3.6 27b GLM 5.2 fine-tune?

DGX Spark OS lifetime?

[R] Compiling Agentic Workflows into LLM Weights: Near-Frontier Quality at Two Orders of Magnitude Less Cost

Retrieval-Augmented Personalization with Foundation Models for Wearable Stress Detection

The Geometry of Sequential Learning: Lie-Bracket Prediction of Transfer Order

Neural Scaling Universality: If Exponents Are Fixed, Time to Understand Coefficients

Detect, Unlearn, Restore: Defending Text Summarization Models Against Data Poisoning

Does Translation-Enhanced Speech Encoder Pre-training Affect Speech LLMs?

Scale or Reason? A Compute-Equivalent Analysis of Reasoning Distillation

Gemma4-26B-A4B & 31B-QAT Uncensored Balanced are out with MTP (35% & 53% speed boost)!

I made a superhuman Generals.io agent with self-play RL [P]

Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel

Weight-Space Geometry of Offline Reasoning Training

When Top-1 Fails: Calibrating LoRA Monitors for Masked Diffusion LMs

Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning

Matching Tasks to Objectives: Fine-Tuning and Prompt-Tuning Strategies for Encoder-Decoder Pre-trained Language Models

Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning

Training a Qwen 3.5 4B/9B agent for multi-tool use: SFT first or go directly to RL?

Is Gemma 4 going to be the next Mistral (or Qwen3.6) one day? Concerning the lack of finetunes

No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages

not much happened today

Spectral DPPs via NEPv: A Scalable Continuous Relaxation of Determinantal MAP for Diversity-Aware Data Selection

Techniques for Peak Memory Reduction for LoRA Fine-tuning of LLMs on Edge Devices

Tracking Representation Dynamics in Large Language Models with Persistent Homology

Predicting Mergeability of Parameter-Efficient Fine-Tuning Updates

Uncertainty-Aware Reward Modeling for Stable RLHF

Multi-Modal Contrastive Learning for Implicit Earth Embeddings via Location Tying

Disentangling Linguistic Relatedness from Task Alignment in Cross-Lingual Transfer

Clusters are All You Need: Pre-Training the Tsetlin Machine with Semantic Clusters from Language Models for Interpretability

Actionable Activation Directions for Detecting and Mitigating Emergent Misalignment Across Language Model Families

MENTOR: Reinforcement Learning via Flexible Teacher-Optimized Rewards for Tool-Use Distillation

Improving Alignment Between Human and Machine Codes: An Empirical Assessment of Prompt Engineering for Construct Identification in Psychology

b9714

CODEBLOCK: Learning to Supervise Code at the Right Granularity

DRIFT: Refining Instruction Data via On-Policy Data Attribution