Tag

Training

450 articles archived under #training · RSS

arXiv — Machine Learning research 2h ago

FRAME: Learning the Adaptation Domain with a Mixture of Fractional-Fourier Experts

arXiv:2607.00162v1 Announce Type: new Abstract: Parameter-efficient fine-tuning (PEFT) reparameterizes weight updates in a fixed basis: low-rank adapters operate in the spatial domain, while a recent line of spectral methods operates in a fixed Fourier domain. We argue that the…

36
arXiv — Machine Learning research 2h ago

Loss Smoothing for Stable Adaptation Under Distribution Shift

arXiv:2607.00634v1 Announce Type: new Abstract: In settings such as fine-tuning and reinforcement learning, neural networks are often adapted under distribution shift. Standard adaptation methods typically optimize the target objective directly, inducing an abrupt change from…

38
arXiv — Machine Learning research 2h ago

Local Motion Matters: A Deconstruct-Recompose Paradigm for Reinforcement Learning Pre-training from Videos

arXiv:2607.00808v1 Announce Type: new Abstract: Pre-training on large-scale videos to improve reinforcement learning efficiency is promising yet remains challenging. Existing methods typically treat the agent as an indivisible entity, modeling motion patterns globally. Such…

8
arXiv — Machine Learning research 2h ago

From Pixels to Temporal Correlations: Learning Informative Representations for Reinforcement Learning Pre-training

arXiv:2607.00811v1 Announce Type: new Abstract: Unsupervised pre-training on large-scale datasets has demonstrated significant potential for improving the sample efficiency and performance of Reinforcement Learning (RL). Given the large-scale action-free internet videos,…

13
arXiv — Machine Learning research 2h ago

Staleness-Learning Rate Scaling Laws for Asynchronous RLHF

arXiv:2607.01083v1 Announce Type: new Abstract: High-throughput RLHF systems often decouple rollout generation from policy optimization, leading to the use of stale rollouts during learner updates. In this work, we study the effect of such staleness in asynchronous GRPO. We make…

23
arXiv — Machine Learning research 2h ago

ZO-Act: Efficient Zeroth-Order Fine-Tuning via One-Shot Activation-Informed Low-Rank Subspaces

arXiv:2607.01125v1 Announce Type: new Abstract: Zeroth-order (ZO) optimization enables fine-tuning large language models when backpropagation is unavailable or memory-prohibitive, but existing methods often perturb full model weights or randomly constructed low-dimensional…

4
arXiv — NLP / Computation & Language research 2h ago

MultiSynt/MT: Trillion-Token Multi-Parallel Pre-Training Data Translated Across 36 Languages

arXiv:2607.00890v1 Announce Type: new Abstract: Open web-scale pre-training corpora remain concentrated in English, limiting multilingual LLM development. We introduce MultiSynt/MT, an open synthetic parallel corpus with approximately 4.8 trillion target-language tokens across…

12
r/LocalLLaMA community 8h ago

My reasons to run local models

I can finetune any model on any dataset I want. I can use techniques like speculative decoding and other sota approaches to get the max tps The llm provides like anthropic and openai are not getting access to my data The hardware is reusable for vision text speech, and I can run…

10
Hugging Face Daily Papers research 10h ago

SpheRoPE: Zero-Shot Optimization-Free 360 Panorama Generation with Spherical RoPE

Abstract A novel zero-shot framework injects spherical priors into pre-trained diffusion transformers for 360 panoramic generation, using spherical RoPE and semantic distortion guidance to overcome topological constraints without training or optimization. Generated by…

35
Hugging Face Daily Papers research 12h ago

Play2Perfect: What Matters in Dexterous Play Pretraining for Precise Assembly?

Abstract A reinforcement learning framework called Play2Perfect enables sample-efficient robotic assembly tasks by first learning general manipulation skills through playful interaction with diverse objects, then adapting these skills for precise assembly through fine-tuning.…

34
NVIDIA Developer Blog official-blog 13h ago

Mastering Agentic Techniques: AI Agent Reinforcement Learning

Reinforcement learning (RL) is central to aligning language models, from reinforcement learning with human feedback (RLHF) within AI assistants to newer...

38
r/LocalLLaMA community 14h ago

Open Models - June 2026

After overwhelming April , OK May , here's June. Yeah, Graph has only less items. Because we got other items here last month. Finetunes : Nex-N2 Ornith-1.0 Agents-A1 Holo3.1 Tmax-27b MusaCoder-27B VibeThinker-3B NVFP4 from NVIDIA for below models :…

8
r/LocalLLaMA community 18h ago

Hister: Give Your AI Assistant a Private Memory

I have been working on Hister, a self hosted search engine that automatically indexes pages you visit, local files, and documentation, then keeps them searchable with stored offline previews. It also exposes an MCP endpoint, so local AI assistants can search your own indexed…

5
Hugging Face Daily Papers research 20h ago

MuSViT: A Foundation Vision Model for Sheet Music Representation

Abstract MuSViT is a vision transformer-based foundation model pre-trained on millions of sheet music pages that demonstrates superior performance in music score recognition and symbol detection tasks through both linear probing and fine-tuning approaches. Generated by…

10
llama.cpp releases dev-tools 1d ago

b9853

ui: Remove PWA navigate fallback to prevent caching API endpoint requ…

7
arXiv — Machine Learning research 1d ago

Fora: From Weight-Space to Function-Space Protection in Capability-Preserving Fine-Tuning

arXiv:2606.31092v1 Announce Type: new Abstract: Full fine-tuning adapts large language models to new tasks but can erode capabilities they already possess. Existing remedies protect through proxies such as parameter distances, importance penalties, output matching, or dominant…

11
arXiv — NLP / Computation & Language research 1d ago

ComplianceGate: Classifier-Gated Multi-Tier LLM Routing for Inference in Regulated Industries

arXiv:2606.31163v1 Announce Type: cross Abstract: Large language models deployed in regulated industries operate under two constraints: compliance enforcement and cost efficiency. Personally identifiable information (PII) in user queries can reach model endpoints before the…

14
arXiv — Machine Learning research 1d ago

Mixture-of-Control: State-Aware Fine-Tuning for Transformer-based Models

arXiv:2606.31397v1 Announce Type: new Abstract: State-based fine-tuning has emerged as a compelling alternative to weight-based adaptation for transformers, updating lightweight controls into states rather than model weights, offering substantial memory savings while retaining…

27
arXiv — Machine Learning research 1d ago

Evil Spectra: How Optimisers can Amplify or Suppress Emergent Misalignment

arXiv:2606.31591v1 Announce Type: new Abstract: Emergent misalignment (EM) is a recently discovered phenomenon in LLMs where fine-tuning on a narrow misaligned task, such as writing insecure code, leads to broadly misaligned behaviour on unrelated prompts. Previous work has…

11
arXiv — Machine Learning research 1d ago

Nonlinearity-Aware LoRA: Structured Gate Adaptation under Low-Rank Constraints

arXiv:2606.31717v1 Announce Type: new Abstract: Low-rank adaptation (LoRA) is commonly viewed as an update-space approximation to full fine-tuning, yet this view is incomplete for self-gated Transformer feed-forward networks. In gated FFNs, a low-rank residual can change not…

13
arXiv — Machine Learning research 1d ago

Geometry-Preserving Orthonormal Initialization for Low-Rank Adaptation in RLVR

arXiv:2606.31813v1 Announce Type: new Abstract: Low-rank adaptation (LoRA) and its variants enable parameter-efficient fine-tuning of large language models under the supervised fine-tuning (SFT) paradigm. However, their efficacy and behavior under Reinforcement learning with…

24
arXiv — NLP / Computation & Language research 1d ago

Information Terra: A Narrative-Anchored Semantic-First Projection of Document Embeddings

arXiv:2606.30824v1 Announce Type: cross Abstract: We introduce Information Terra, a narrative-anchored semantic-first projection that places a document corpus on an Earth-like globe whose poles are two user-chosen endpoint documents and whose prime meridian is the great-circle…

28
Hugging Face Daily Papers research 1d ago

Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks

Abstract Evolutionary fine-tuning enables large language models to develop cross-task problem-solving capabilities by learning from search trajectories, demonstrating improved performance on mathematical conjectures and optimization tasks. Generated by…

11
Hugging Face Daily Papers research 1d ago

A Gravitational Interpretation of Fine-Tuning Reversion

Abstract Post-alignment safety degradation arises from geometric properties of training history, where fine-tuning reversion follows a persistent direction defined by early training dynamics. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Fine-tuning on harmless data can partially…

35
Hugging Face Daily Papers research 1d ago

RaysUp: Ultra-light Universal Feature Upsampling via Geometry-Aware Ray Representation

Abstract RaysUp is a lightweight, task-agnostic feature upsampling framework that reconstructs high-resolution features using geometry-aware ray domain techniques with improved efficiency and accuracy. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Pre-trained Vision Foundation…

37
Hugging Face Daily Papers research 1d ago

ZooClaw-FashionSigLIP2: Distilled Fine-tuning for Robust Fashion Retrieval

Abstract A fashion-specialized vision-language model achieves superior retrieval performance through full fine-tuning with knowledge distillation and weight interpolation, outperforming existing methods on a new benchmark while addressing structural biases in existing datasets.…

32
arXiv — Machine Learning research 2d ago

A Gravitational Interpretation of Fine-Tuning Reversion

arXiv:2606.28525v1 Announce Type: new Abstract: Fine-tuning on harmless data can partially undo behaviors acquired earlier in training. Safety can erode under benign post-alignment updates, unlearned capabilities can re-emerge, latent traits can transfer through apparently…

27
arXiv — Machine Learning research 2d ago

DLR: Zero-Inference-Cost Latent Residuals for Low-Rank Pre-Training

arXiv:2606.28932v1 Announce Type: new Abstract: Large language models have driven recent progress in language and multimodal AI, yet pre-training them at scale is prohibitively expensive. Low-rank pre-training, which factorizes each weight matrix into a rank-r product to reduce…

35
arXiv — Machine Learning research 2d ago

BaRA: Bayesian Adaptive Rank Allocation for Parameter-Efficient Fine-Tuning

arXiv:2606.29184v1 Announce Type: new Abstract: While Low-rank adaptation (LoRA) enables highly efficient fine-tuning by constraining task-specific updates to fixed low-rank subspaces, this rigid design limits representational flexibility and often results in overconfident…

15
arXiv — Machine Learning research 2d ago

Optimizer Memory Makes Shuffle Order a First-Order Source of Fine-Tuning Noise

arXiv:2606.29554v1 Announce Type: new Abstract: Shuffle order can be a larger source of fine-tuning noise than a memoryless analysis predicts: fixed-clock optimizer memory makes local equal-multiset contrasts first order in the learning rate rather than second order, and the…

8
arXiv — NLP / Computation & Language research 2d ago

The Heterogeneous Safety Impacts of Benign Multilingual Fine-Tuning

arXiv:2606.28843v1 Announce Type: new Abstract: Fine-tuning a large language model is a ubiquitous method for enhancing its capability on a specific downstream task. However, prior work has shown that this increase in capability comes with a cost: it can increase a model's…

18
arXiv — NLP / Computation & Language research 2d ago

PASTA: A Paraphrasing And Self-Training Approach for Knowledge Updating in LLMs

arXiv:2606.28898v1 Announce Type: new Abstract: Knowledge updating in pre-trained Large Language Models (LLMs) remains an important challenge. While continual training provides a potential avenue for knowledge updating, it continues to present substantial technical difficulties.…

20
arXiv — NLP / Computation & Language research 2d ago

Fine-Tuning General-Purpose Large Language Models for Agricultural Applications:A Reproducible Framework and Evaluation Protocol Based on Qwen3-8B

arXiv:2606.28992v1 Announce Type: new Abstract: General-purpose large language models (LLMs) have demonstrated strong abilities in opendomain question answering, information extraction, and text generation. Agricultural applications, however, are domain-specific,…

20
arXiv — NLP / Computation & Language research 2d ago

Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks

arXiv:2606.29082v1 Announce Type: new Abstract: Would experience designing faster GPU kernels also help close in on a long-standing open mathematical conjecture? Large Language Models (LLMs) integrated into evolutionary search have recently produced state-of-the-art solutions on…

4
arXiv — NLP / Computation & Language research 2d ago

Do We Still Need Fine Tuning? Turkish Sentiment Analysis in the Era of Large Language Model

arXiv:2606.29614v1 Announce Type: new Abstract: This study examines whether supervised fine-tuning remains necessary for Turkish sentiment analysis in the era of large language models. We compare classical machine learning methods, fine-tuned pretrained language models, and…

35
arXiv — NLP / Computation & Language research 2d ago

SrDetection: A Self-Referential Framework for Data Leakage Detection in Code Large Language Models

arXiv:2606.29815v1 Announce Type: new Abstract: Evaluating code large language models (Code LLMs) requires reliable detection of data leakage, where benchmark performance is artificially inflated by exposure to benchmark data during pre-training. Existing approaches either…

7
Hugging Face Daily Papers research 2d ago

Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent

Abstract Agents-A1, a 35B Mixture-of-Experts Agentic Model, achieves trillion-parameter-level performance through long-horizon trajectory scaling and heterogeneous agent ability scaling via a three-stage training approach involving supervised fine-tuning, domain-level teacher…

28
Vercel — AI dev-tools 2d ago

Expanded Audit Log coverage, now delivered through Vercel Drains

Audit Logs now capture 400+ unique team activity events , giving teams broader coverage for security reviews, compliance workflows, and investigations. With Vercel Drains support, teams can export those events to custom HTTP endpoints or Amazon S3, replacing Custom SIEM Log…

6
r/MachineLearning community 2d ago

I'm trying to implement CALM paper, and I have some questions. [P]

Hello, I'm trying to implement the Pocket TTS by kyutai-labs represented by this paper . Since they have didn't released the training/fine-tuning code. I'm trying to implement it on my own for learning some stuff. I have read the paper, tried to implement it with much more…

34
r/LocalLLaMA community 3d ago

Update: First Manual Results from Testing Procedural Skill Transfer in Small Models

Yesterday I posted an idea for testing whether a large model can transfer some of its procedural skill to a smaller model without fine-tuning. The short version of the idea was this: Small models are often not completely lacking knowledge. They know the syntax. They know the…

18
arXiv — Machine Learning research 3d ago

PEBS: Per-rater Empirical-Bayes Shrinkage for RLHF Reward-Model Calibration

arXiv:2606.27578v1 Announce Type: new Abstract: Reward models for Reinforcement Learning from Human Feedback (RLHF) pool preferences across thousands of annotators and fit one global affine calibrator, collapsing raters with systematically different rating-scale offsets and…

36
arXiv — Machine Learning research 3d ago

Retroactive Advantage Correction: Closed-Form V-Trace Bias Correction for Delay-Aware RLHF

arXiv:2606.27580v1 Announce Type: new Abstract: Reinforcement learning from human feedback (RLHF) in production does not always have a synchronous reward signal. Code-execution verifiers, slow judge ensembles, and queued human review can return several gradient steps after the…

14
arXiv — Machine Learning research 3d ago

Two-Stage Fine-Tuning for Protein Sequence Generation with Targeted Amino-Acid Composition

arXiv:2606.27939v1 Announce Type: new Abstract: Protein language models are standard priors for biological sequence generation, but steering them toward explicit distributional design targets remains largely unexplored. We study a constrained protein generation problem in which…

24
arXiv — Machine Learning research 3d ago

When One Adapter Speaks for Many: Discovering Low-Rank Redundancy in Continual Fine-Tuning

arXiv:2606.28117v1 Announce Type: new Abstract: Low-Rank Adaptation (LoRA) has become the standard tool for parameter-efficient fine-tuning of large pretrained models. When applied sequentially across tasks in Continual Learning (CL), the standard assumption is that each new…

38
arXiv — Machine Learning research 3d ago

Qwen-Image-2.0-RL Technical Report

arXiv:2606.27608v1 Announce Type: cross Abstract: We present Qwen-Image-2.0-RL, a post-training pipeline that applies reinforcement learning from human feedback (RLHF) and on-policy distillation (OPD) to improve both the visual quality and instruction-following capability of the…

34
arXiv — NLP / Computation & Language research 3d ago

Causal Connections: Leveraging Multilingual Fine-Tuning for Financial QA@FinCausal 2026

arXiv:2606.27446v1 Announce Type: new Abstract: This paper describes team HSA_CORAL's submission to the FinCausal 2026 shared task on extracting cause-effect relations from financial narratives via extractive question answering in English and Spanish. We compare three modeling…

4
arXiv — NLP / Computation & Language research 3d ago

Low-Agreeableness Persona Conditioning for Safe LLM Fine-Tuning

arXiv:2606.27709v1 Announce Type: new Abstract: Recent work has shown that fine-tuning large language models (LLMs) for social warmth degrades factual reliability and increases sycophancy. We investigate a related but distinct failure mode: warmth fine-tuning also weakens…

22
arXiv — NLP / Computation & Language research 3d ago

HPRO: Hierarchical Progressive Reward Optimization via Preference Extraction for Emotional Text-to-Speech

arXiv:2606.28249v1 Announce Type: cross Abstract: Recently, Large Language Model (LLM)-based Text-to-Speech (TTS) models have achieved remarkable naturalness. However, the standard Supervised Fine-Tuning paradigm often converges to statistically averaged prosody, limiting…

20
arXiv — NLP / Computation & Language research 3d ago

Continual Memorization of Factoids in Language Models

arXiv:2411.07175v3 Announce Type: replace Abstract: As new knowledge rapidly accumulates, language models (LMs) with pretrained knowledge quickly become obsolete. A common approach to updating LMs is fine-tuning them directly on new knowledge. However, recent studies have shown…

27
r/LocalLLaMA community 3d ago

MLX Fine-Tune Example Guide

A Local MLX Fine-Tuning Experiment Just finished a local LoRA fine-tune of a 7B instruction model on Apple Silicon, via MLX, teaching it a high-fantasy literary register (Gene Wolfe and Tolkien). This is a more rigorous version with more data of something I tried two years ago…

14

FRAME: Learning the Adaptation Domain with a Mixture of Fractional-Fourier Experts

Loss Smoothing for Stable Adaptation Under Distribution Shift

Local Motion Matters: A Deconstruct-Recompose Paradigm for Reinforcement Learning Pre-training from Videos

From Pixels to Temporal Correlations: Learning Informative Representations for Reinforcement Learning Pre-training

Staleness-Learning Rate Scaling Laws for Asynchronous RLHF

ZO-Act: Efficient Zeroth-Order Fine-Tuning via One-Shot Activation-Informed Low-Rank Subspaces

MultiSynt/MT: Trillion-Token Multi-Parallel Pre-Training Data Translated Across 36 Languages

My reasons to run local models

SpheRoPE: Zero-Shot Optimization-Free 360 Panorama Generation with Spherical RoPE

Play2Perfect: What Matters in Dexterous Play Pretraining for Precise Assembly?

Mastering Agentic Techniques: AI Agent Reinforcement Learning

Open Models - June 2026

Hister: Give Your AI Assistant a Private Memory

MuSViT: A Foundation Vision Model for Sheet Music Representation

b9853

Fora: From Weight-Space to Function-Space Protection in Capability-Preserving Fine-Tuning

ComplianceGate: Classifier-Gated Multi-Tier LLM Routing for Inference in Regulated Industries

Mixture-of-Control: State-Aware Fine-Tuning for Transformer-based Models

Evil Spectra: How Optimisers can Amplify or Suppress Emergent Misalignment

Nonlinearity-Aware LoRA: Structured Gate Adaptation under Low-Rank Constraints

Geometry-Preserving Orthonormal Initialization for Low-Rank Adaptation in RLVR

Information Terra: A Narrative-Anchored Semantic-First Projection of Document Embeddings

Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks

A Gravitational Interpretation of Fine-Tuning Reversion

RaysUp: Ultra-light Universal Feature Upsampling via Geometry-Aware Ray Representation

ZooClaw-FashionSigLIP2: Distilled Fine-tuning for Robust Fashion Retrieval

A Gravitational Interpretation of Fine-Tuning Reversion

DLR: Zero-Inference-Cost Latent Residuals for Low-Rank Pre-Training

BaRA: Bayesian Adaptive Rank Allocation for Parameter-Efficient Fine-Tuning

Optimizer Memory Makes Shuffle Order a First-Order Source of Fine-Tuning Noise

The Heterogeneous Safety Impacts of Benign Multilingual Fine-Tuning

PASTA: A Paraphrasing And Self-Training Approach for Knowledge Updating in LLMs

Fine-Tuning General-Purpose Large Language Models for Agricultural Applications:A Reproducible Framework and Evaluation Protocol Based on Qwen3-8B

Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks

Do We Still Need Fine Tuning? Turkish Sentiment Analysis in the Era of Large Language Model

SrDetection: A Self-Referential Framework for Data Leakage Detection in Code Large Language Models

Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent

Expanded Audit Log coverage, now delivered through Vercel Drains

I'm trying to implement CALM paper, and I have some questions. [P]

Update: First Manual Results from Testing Procedural Skill Transfer in Small Models

PEBS: Per-rater Empirical-Bayes Shrinkage for RLHF Reward-Model Calibration

Retroactive Advantage Correction: Closed-Form V-Trace Bias Correction for Delay-Aware RLHF

Two-Stage Fine-Tuning for Protein Sequence Generation with Targeted Amino-Acid Composition

When One Adapter Speaks for Many: Discovering Low-Rank Redundancy in Continual Fine-Tuning

Qwen-Image-2.0-RL Technical Report

Causal Connections: Leveraging Multilingual Fine-Tuning for Financial QA@FinCausal 2026

Low-Agreeableness Persona Conditioning for Safe LLM Fine-Tuning

HPRO: Hierarchical Progressive Reward Optimization via Preference Extraction for Emotional Text-to-Speech

Continual Memorization of Factoids in Language Models

MLX Fine-Tune Example Guide