News / #training Tag Training 450 articles archived under #training · RSS Sign in to follow r/LocalLLaMA community 4d ago A Blind Visual Paradigm for Testing Skill Transfer in Small Models Without Fine-Tuning TL;DR: Small models aren't dumb, they're shallow. I designed a cross-domain, blind, visual experiment to see if a large model can compress its "planning discipline" into a reusable scaffold that makes a small model deeper — with zero fine-tuning. Three.js is the testbed because… 28 r/LocalLLaMA community 4d ago I built a tool to turn your Claude Code sessions into fine-tuning data for local models If you use Claude Code, every session is already sitting on disk as a .jsonl file under ~/.claude/projects/ . It has real coding conversations: multi-turn edits, tool calls, reasoning traces. That's training data you already generated for free. The problem is the format is not… 36 r/LocalLLaMA community 4d ago Anyone still doing fine-tunes on consumer grade hardware? Felt like there used to be a thriving fine-tuning community a few years back - and then once we started getting models that were smart enough and generalist enough (i.e. post Llama-3-8b era) things kind of dropped off a little. Less need for fine-tunes when prompt-tweaking can… 22 r/LocalLLaMA community 5d ago Are there any qwen finetunes that were genuinely stronger than the base? It's pretty popular to finetune qwen models but I never hear anyone say anything positive about them.   submitted by   /u/MrMrsPotts [link]   [comments] 30 Hugging Face Daily Papers research 6d ago How Post-Training Shapes Biological Reasoning Models Abstract Post-training stages in biological reasoning models differently affect generalization, with continued pre-training aligning models with biological language, supervised fine-tuning improving in-domain performance but reducing out-of-domain generalization, and… 8 arXiv — Machine Learning research 6d ago SSM Adapters via Hankel Reduced-order Modeling: Injection Site Determines Task Suitability in Long-Context Fine-Tuning arXiv:2606.26290v1 Announce Type: new Abstract: While parameter-efficient fine-tuning (PEFT) typically targets attention projectors, its efficacy for tasks requiring sequential state accumulation remains under-explored. We examine if PEFT for such tasks can benefit from state… 18 arXiv — Machine Learning research 6d ago At the Edge of Understanding: Sparse Autoencoders Trace The Limits of Transformer Generalization arXiv:2606.26396v1 Announce Type: new Abstract: Pre-trained transformers have demonstrated remarkable generalization abilities, at times extending beyond the scope of their training data. Yet, real-world deployments often face unexpected or adversarial data that diverges from… 34 arXiv — Machine Learning research 6d ago Localizing RL-Induced Tool Use to a Single Crosscoder Feature arXiv:2606.26474v1 Announce Type: new Abstract: Fine-tuning through RL reshapes the internal representations of language models to enable agentic behaviors such as tool use, yet the mechanistic basis of these changes remains poorly understood. While RL substantially improves… 4 arXiv — Machine Learning research 6d ago Reasoning Quality Emerges Early: Data Curation for Reasoning Models arXiv:2606.26797v1 Announce Type: new Abstract: Supervised fine-tuning (SFT) on a small, high-quality set of long reasoning traces is an effective approach for eliciting strong reasoning capabilities in Large Language Models (LLMs). However, existing methods for curating… 14 arXiv — Machine Learning research 6d ago Designing Reward Signals for Portable Query Generation: A Case Study in Industrial Semantic Job Search arXiv:2606.27291v1 Announce Type: new Abstract: Job-search platforms rely on low-bandwidth query interfaces that often fail to capture the high-dimensional complexity of candidate profiles. We present an end-to-end RLAIF (Reinforcement Learning from AI Feedback) framework to… 10 arXiv — NLP / Computation & Language research 6d ago Helpfulness Hurts: Domain-Dependent Degradation of Mid-Trained Compassion Values Under Post-Training arXiv:2606.26102v1 Announce Type: new Abstract: Standard post-training pipelines apply supervised fine-tuning (SFT) and reinforcement learning (RL) to make language models helpful, but these processes may inadvertently degrade values instilled during pre-training. We investigate… 22 arXiv — NLP / Computation & Language research 6d ago Closing the Quality Gap in Low-Resource Text-to-Speech: LoRA Fine-Tuning of VoxCPM2 for Khmer and Korean arXiv:2606.26618v1 Announce Type: new Abstract: Large pretrained text-to-speech (TTS) models sound almost human for well-resourced languages, but much worse for languages that are rare in their training data. We study this quality gap for Khmer and Korean using VoxCPM2, a… 26 arXiv — NLP / Computation & Language research 6d ago Improving General Role-Playing Agents via Psychology-Grounded Reasoning and Role-Aware Policy Optimization arXiv:2606.27025v1 Announce Type: new Abstract: Building general-purpose role-playing agents that faithfully portray any character from a natural-language profile remains challenging. The dominant paradigm -- supervised fine-tuning -- encourages behavioral mimicry without deep,… 16 r/LocalLLaMA community 6d ago When you don't have a data center GPU Please don't tell me someone is going to (yet again) reply with the longest finetune-merge name in eternity...   submitted by   /u/Iwaku_Real [link]   [comments] 4 Hugging Face official-blog 6d ago Run a vLLM Server on HF Jobs in One Command Back to Articles a]:hidden"> Run a vLLM Server on HF Jobs in One Command Published June 26, 2026 Update on GitHub Upvote - Quentin Gallouédec qgallouedec You can spin up a private, OpenAI-compatible LLM endpoint on Hugging Face infrastructure with a single command — no servers… 18 r/LocalLLaMA community 6d ago Qwen 3.6 27b GLM 5.2 fine-tune? Hi everyone, Since both models are open weights and GLM seems to find that secret to frontier model reasoning, why don't we see any Qwen GLM finetune yet? Is it because GLM 5.2 is recent and finetune and datasets take time or the community is just not interested in the finetune?… 28 r/LocalLLaMA community 6d ago DGX Spark OS lifetime? I think of purchasing 2 DGX Sparks for my office (because a 700+W workstation would be intolerable) for LLM-centric work (inference only, no fine-tuning). I know the OS is based on Ubuntu 24.04. Has Nvidia ever disclosed what is the lifetime of the OS? Meaning, is there a chance… 17 r/MachineLearning community 6d ago [R] Compiling Agentic Workflows into LLM Weights: Near-Frontier Quality at Two Orders of Magnitude Less Cost Token-based billing is causing my company to reevaluate small language models. I came across this paper that shows SLM supervised fine-tuning on traces from orchestration of frontier models can be nearly as performant and much cheaper. Has any tried this in the real world?  … 34 arXiv — Machine Learning research 7d ago Retrieval-Augmented Personalization with Foundation Models for Wearable Stress Detection arXiv:2606.24985v1 Announce Type: new Abstract: Personalization in wearable-based stress detection remains challenging due to substantial inter-individual variability in physiological and behavioral responses. While traditional approaches rely on user-specific fine-tuning or… 5 arXiv — Machine Learning research 7d ago The Geometry of Sequential Learning: Lie-Bracket Prediction of Transfer Order arXiv:2606.24993v1 Announce Type: new Abstract: Sequential learning is order-dependent: from Pile-style next-token domain adaptation to instruction-SFT and DPO, N candidate sources induce N! possible curricula. We show that the local order effect is governed by a computable… 7 arXiv — NLP / Computation & Language research 7d ago Neural Scaling Universality: If Exponents Are Fixed, Time to Understand Coefficients arXiv:2606.25008v1 Announce Type: cross Abstract: Neural scaling laws describe how pre-training loss decays as power laws with training time, model size, and compute. This position paper argues that the exponents of these power laws are fixed by generic mechanisms: a one-third… 13 arXiv — NLP / Computation & Language research 7d ago Detect, Unlearn, Restore: Defending Text Summarization Models Against Data Poisoning arXiv:2606.26036v1 Announce Type: new Abstract: Training-time data poisoning during fine-tuning poses a significant threat to large language models (LLMs) deployed for abstractive text summarization, where small task-specific datasets exert disproportionate influence on model… 26 arXiv — NLP / Computation & Language research 7d ago Does Translation-Enhanced Speech Encoder Pre-training Affect Speech LLMs? arXiv:2606.25444v1 Announce Type: cross Abstract: Connecting a pre-trained speech encoder to a Large Language Model (LLM) is the standard architecture for building Speech LLMs. However, a structural misalignment exists between the encoder and the LLM. Unlike encoders based on… 23 arXiv — NLP / Computation & Language research 7d ago Scale or Reason? A Compute-Equivalent Analysis of Reasoning Distillation arXiv:2509.22193v2 Announce Type: replace Abstract: Distilling reasoning traces from strong teacher models has become the standard recipe for building capable small language models. Yet reasoning traces are 5-20$\times$ longer than standard instruction fine-tuning (IFT) outputs,… 19 r/LocalLLaMA community 7d ago Gemma4-26B-A4B & 31B-QAT Uncensored Balanced are out with MTP (35% & 53% speed boost)! First of all, I'm stoked to announce we are almost at 20 million downloads on HF! (counted only on my own account, no duplicates/quants/finetunes/etc) and almost 5000 members on Discord! Two releases this time, as promised, the bigger Gemma 4 QATs, both Balanced, both with MTP :… 6 r/MachineLearning community 7d ago I made a superhuman Generals.io agent with self-play RL [P] Hi everyone, I trained a self-play RL agent for Generals.io that reached superhuman-level and ranked #1 on the human 1v1 leaderboard. It began as my master's thesis where the goal was to beat a prior algorithm based agent. We succeeded using behavior cloning, RL fine-tuning and… 6 Hugging Face official-blog 7d ago Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel Back to Articles a]:hidden"> Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel Enterprise + Article Published June 24, 2026 Upvote - Adil Asif adil-asif nvidia Alexandros Koumparoulis akoumpa nvidia Wenwen Gao wgao2021 nvidia Sylendran Arunagiri Sylendran95 nvidia… 29 arXiv — Machine Learning research 8d ago Weight-Space Geometry of Offline Reasoning Training arXiv:2606.23740v1 Announce Type: new Abstract: Offline reinforcement-learning losses (RFT, RIFT, DFT, Offline GRPO, DPO) are widely used to distill reasoning from large teachers into smaller students, and are typically compared on downstream accuracy alone. We ask whether they… 6 arXiv — NLP / Computation & Language research 8d ago When Top-1 Fails: Calibrating LoRA Monitors for Masked Diffusion LMs arXiv:2606.24119v1 Announce Type: cross Abstract: Discrete diffusion language model (DLM) fine-tuning inherits inexpensive diagnostics from denoising-time confidence monitors, but their PEFT-training meaning is untested. We test top-1 argmax concentration as a collapse warning.… 12 arXiv — NLP / Computation & Language research 8d ago Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning arXiv:2606.24133v1 Announce Type: cross Abstract: The composition of training data, governed by the diversity of sources and their mixing strategy, is a cornerstone of Large Language Model (LLM) pre-training. Online Data Mixing (ODM), the technique of adaptively adjusting data… 13 arXiv — NLP / Computation & Language research 8d ago Matching Tasks to Objectives: Fine-Tuning and Prompt-Tuning Strategies for Encoder-Decoder Pre-trained Language Models arXiv:2606.24841v1 Announce Type: cross Abstract: Prompt-based learning has emerged as a dominant paradigm in natural language processing. This study explores the impact of diverse pre-training objectives on the performance of encoder-decoder pre-trained language models across… 18 Hugging Face Daily Papers research 8d ago Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning Abstract A novel online data mixing framework called Holistic Data Scheduler uses reinforcement learning with a multi-objective reward function to optimize large language model pre-training efficiency and performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The composition… 38 r/LocalLLaMA community 9d ago Training a Qwen 3.5 4B/9B agent for multi-tool use: SFT first or go directly to RL? To train Qwen 3.5 4B or 9B for a custom multi-tool agent workflow and would appreciate guidance from people who have done this successfully. A few questions: SFT → RL or RL-only? - Is it still recommended to first do supervised fine-tuning (tool-calling traces, reasoning… 15 r/LocalLLaMA community 9d ago Is Gemma 4 going to be the next Mistral (or Qwen3.6) one day? Concerning the lack of finetunes https://eqbench.com/creative_writing.html#:~:text=gemma%2D4%2D31B,Sample From what I've seen Gemma 4 has better everything (especially long-context adherence) EXCEPT for the raw prosing performance of Mistral... finetunes . Comparing bases only, Mistral Small 3.2 (the… 5 Hugging Face Daily Papers research 12d ago No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages Abstract Research addresses code generation challenges for no-resource programming languages by developing benchmarks and proposing a method that combines further pre-training with weight difference transfer to create specialized instruction-following models at reduced… 27 Smol AI News news-outlet 13d ago not much happened today **GLM-5.2** emerges as a leading open-weight coding model rivaling **Opus 4.8** and **GPT-5.5** in software engineering tasks, emphasizing the strategic importance of open models for provider competition, on-prem deployment, and fine-tuning rights. Experts like **Patrick… 17 arXiv — Machine Learning research 13d ago Spectral DPPs via NEPv: A Scalable Continuous Relaxation of Determinantal MAP for Diversity-Aware Data Selection arXiv:2606.19411v1 Announce Type: new Abstract: Selecting a small, diverse, high-quality subset from a massive pool of candidates is a recurring primitive in modern machine learning -- data curation and coreset selection for training and fine-tuning large models, active-learning… 32 arXiv — Machine Learning research 13d ago Techniques for Peak Memory Reduction for LoRA Fine-tuning of LLMs on Edge Devices arXiv:2606.19528v1 Announce Type: new Abstract: Fine-tuning of Large Language Models (LLMs) using Low-Rank Adaptation (LoRA) on an end-user's data offers personalized experiences while keeping data private, but faces severe memory constraints on consumer hardware. Peak memory… 15 arXiv — Machine Learning research 13d ago Tracking Representation Dynamics in Large Language Models with Persistent Homology arXiv:2606.19542v1 Announce Type: new Abstract: Large language models are commonly aligned through supervised fine-tuning, yet little is known about how their internal representations evolve during this process. We study alignment dynamics using persistent homology by tracking… 38 arXiv — Machine Learning research 13d ago Predicting Mergeability of Parameter-Efficient Fine-Tuning Updates arXiv:2606.19549v1 Announce Type: new Abstract: Low-rank adaptation (LoRA) makes it cheap to train many domain- and task-specific language model adapters, but whether two adapters can be merged is usually discovered only after both have been fully trained and evaluated. This… 7 arXiv — Machine Learning research 13d ago Uncertainty-Aware Reward Modeling for Stable RLHF arXiv:2606.19818v1 Announce Type: new Abstract: Reinforcement learning from human feedback (RLHF) aligns large language models by training reward models on preference data and optimizing policies to maximize predicted rewards. However, this pipeline faces two fundamental… 4 arXiv — Machine Learning research 13d ago Multi-Modal Contrastive Learning for Implicit Earth Embeddings via Location Tying arXiv:2606.20167v1 Announce Type: new Abstract: Spatial prediction tasks are often limited by a lack of high-quality labelled ground-truth observations. To overcome this challenge, self-supervised pre-training is a possible solution, with contrastive learning dominant for… 6 arXiv — NLP / Computation & Language research 13d ago Disentangling Linguistic Relatedness from Task Alignment in Cross-Lingual Transfer arXiv:2606.19346v1 Announce Type: new Abstract: We study cross-lingual transfer by fine-tuning seven large language models (4B--671B parameters) on Arabic and evaluating zero-shot reading comprehension on Semitic languages and non-Semitic controls. Across dense and… 6 arXiv — NLP / Computation & Language research 13d ago Clusters are All You Need: Pre-Training the Tsetlin Machine with Semantic Clusters from Language Models for Interpretability arXiv:2606.19815v1 Announce Type: new Abstract: Pre-trained language models such as BERT achieve strong text classification performance but lack transparency, limiting their use in high-stakes settings. The Tsetlin Machine (TM) offers fully interpretable, clause-based reasoning… 25 arXiv — NLP / Computation & Language research 13d ago Actionable Activation Directions for Detecting and Mitigating Emergent Misalignment Across Language Model Families arXiv:2606.20225v1 Announce Type: new Abstract: Fine-tuning language models on insecure code induces emergent misalignment with poorly understood internal structure. We investigate whether this misalignment corresponds to a causally actionable activation-space direction shared… 31 arXiv — NLP / Computation & Language research 13d ago MENTOR: Reinforcement Learning via Flexible Teacher-Optimized Rewards for Tool-Use Distillation arXiv:2510.18383v3 Announce Type: replace Abstract: Distilling the tool-use capabilities of large language models (LLMs) into small language models (SLMs) is essential for their practical application. The predominant approach, supervised fine-tuning (SFT), suffers from poor… 20 arXiv — NLP / Computation & Language research 13d ago Improving Alignment Between Human and Machine Codes: An Empirical Assessment of Prompt Engineering for Construct Identification in Psychology arXiv:2512.03818v2 Announce Type: replace Abstract: Due to their architecture and vast pre-training data, large language models (LLMs) demonstrate strong text classification performance. However, LLM output - here, the category assigned to a text - depends heavily on the wording… 33 llama.cpp releases dev-tools 13d ago b9714 server: add "X-Accel-Buffering": "no" header to streaming endpoints ( #24774 ) server: add "X-Accel-Buffering": "no" header to streaming endpoints This header tells Nginx (as a reverse proxy) to NOT buffer responses. (only affects streaming endpoints) Without it, Nginx will… 11 arXiv — Machine Learning research 14d ago CODEBLOCK: Learning to Supervise Code at the Right Granularity arXiv:2606.18286v1 Announce Type: new Abstract: Supervised fine-tuning of code LLMs typically applies uniform cross-entropy loss to all response tokens, implicitly assuming that every token provides equally useful learning signal. Recent token-level selection methods challenge… 34 arXiv — Machine Learning research 14d ago DRIFT: Refining Instruction Data via On-Policy Data Attribution arXiv:2606.18307v1 Announce Type: new Abstract: Optimizing the training data distribution for Supervised Fine-Tuning (SFT) dictates the capability of Large Language Models (LLMs). While existing data curation methods excel at accelerating training under constrained budgets, they… 23 Page 2 of 9 · 450 articles ← Newer Older →