News / #training Tag Training 450 articles archived under #training · RSS Sign in to follow arXiv — Machine Learning research 2h ago FRAME: Learning the Adaptation Domain with a Mixture of Fractional-Fourier Experts arXiv:2607.00162v1 Announce Type: new Abstract: Parameter-efficient fine-tuning (PEFT) reparameterizes weight updates in a fixed basis: low-rank adapters operate in the spatial domain, while a recent line of spectral methods operates in a fixed Fourier domain. We argue that the… 36 arXiv — Machine Learning research 2h ago Loss Smoothing for Stable Adaptation Under Distribution Shift arXiv:2607.00634v1 Announce Type: new Abstract: In settings such as fine-tuning and reinforcement learning, neural networks are often adapted under distribution shift. Standard adaptation methods typically optimize the target objective directly, inducing an abrupt change from… 38 arXiv — Machine Learning research 2h ago Local Motion Matters: A Deconstruct-Recompose Paradigm for Reinforcement Learning Pre-training from Videos arXiv:2607.00808v1 Announce Type: new Abstract: Pre-training on large-scale videos to improve reinforcement learning efficiency is promising yet remains challenging. Existing methods typically treat the agent as an indivisible entity, modeling motion patterns globally. Such… 8 arXiv — Machine Learning research 2h ago From Pixels to Temporal Correlations: Learning Informative Representations for Reinforcement Learning Pre-training arXiv:2607.00811v1 Announce Type: new Abstract: Unsupervised pre-training on large-scale datasets has demonstrated significant potential for improving the sample efficiency and performance of Reinforcement Learning (RL). Given the large-scale action-free internet videos,… 13 arXiv — Machine Learning research 2h ago Staleness-Learning Rate Scaling Laws for Asynchronous RLHF arXiv:2607.01083v1 Announce Type: new Abstract: High-throughput RLHF systems often decouple rollout generation from policy optimization, leading to the use of stale rollouts during learner updates. In this work, we study the effect of such staleness in asynchronous GRPO. We make… 23 arXiv — Machine Learning research 2h ago ZO-Act: Efficient Zeroth-Order Fine-Tuning via One-Shot Activation-Informed Low-Rank Subspaces arXiv:2607.01125v1 Announce Type: new Abstract: Zeroth-order (ZO) optimization enables fine-tuning large language models when backpropagation is unavailable or memory-prohibitive, but existing methods often perturb full model weights or randomly constructed low-dimensional… 4 arXiv — NLP / Computation & Language research 2h ago MultiSynt/MT: Trillion-Token Multi-Parallel Pre-Training Data Translated Across 36 Languages arXiv:2607.00890v1 Announce Type: new Abstract: Open web-scale pre-training corpora remain concentrated in English, limiting multilingual LLM development. We introduce MultiSynt/MT, an open synthetic parallel corpus with approximately 4.8 trillion target-language tokens across… 12 r/LocalLLaMA community 8h ago My reasons to run local models I can finetune any model on any dataset I want. I can use techniques like speculative decoding and other sota approaches to get the max tps The llm provides like anthropic and openai are not getting access to my data The hardware is reusable for vision text speech, and I can run… 10 Hugging Face Daily Papers research 10h ago SpheRoPE: Zero-Shot Optimization-Free 360 Panorama Generation with Spherical RoPE Abstract A novel zero-shot framework injects spherical priors into pre-trained diffusion transformers for 360 panoramic generation, using spherical RoPE and semantic distortion guidance to overcome topological constraints without training or optimization. Generated by… 35 Hugging Face Daily Papers research 12h ago Play2Perfect: What Matters in Dexterous Play Pretraining for Precise Assembly? Abstract A reinforcement learning framework called Play2Perfect enables sample-efficient robotic assembly tasks by first learning general manipulation skills through playful interaction with diverse objects, then adapting these skills for precise assembly through fine-tuning.… 34 NVIDIA Developer Blog official-blog 13h ago Mastering Agentic Techniques: AI Agent Reinforcement Learning Reinforcement learning (RL) is central to aligning language models, from reinforcement learning with human feedback (RLHF) within AI assistants to newer... 38 r/LocalLLaMA community 14h ago Open Models - June 2026 After overwhelming April , OK May , here's June. Yeah, Graph has only less items. Because we got other items here last month. Finetunes : Nex-N2 Ornith-1.0 Agents-A1 Holo3.1 Tmax-27b MusaCoder-27B VibeThinker-3B NVFP4 from NVIDIA for below models :… 8 r/LocalLLaMA community 18h ago Hister: Give Your AI Assistant a Private Memory I have been working on Hister, a self hosted search engine that automatically indexes pages you visit, local files, and documentation, then keeps them searchable with stored offline previews. It also exposes an MCP endpoint, so local AI assistants can search your own indexed… 5 Hugging Face Daily Papers research 20h ago MuSViT: A Foundation Vision Model for Sheet Music Representation Abstract MuSViT is a vision transformer-based foundation model pre-trained on millions of sheet music pages that demonstrates superior performance in music score recognition and symbol detection tasks through both linear probing and fine-tuning approaches. Generated by… 10 llama.cpp releases dev-tools 1d ago b9853 ui: Remove PWA navigate fallback to prevent caching API endpoint requ… 7 arXiv — Machine Learning research 1d ago Fora: From Weight-Space to Function-Space Protection in Capability-Preserving Fine-Tuning arXiv:2606.31092v1 Announce Type: new Abstract: Full fine-tuning adapts large language models to new tasks but can erode capabilities they already possess. Existing remedies protect through proxies such as parameter distances, importance penalties, output matching, or dominant… 11 arXiv — NLP / Computation & Language research 1d ago ComplianceGate: Classifier-Gated Multi-Tier LLM Routing for Inference in Regulated Industries arXiv:2606.31163v1 Announce Type: cross Abstract: Large language models deployed in regulated industries operate under two constraints: compliance enforcement and cost efficiency. Personally identifiable information (PII) in user queries can reach model endpoints before the… 14 arXiv — Machine Learning research 1d ago Mixture-of-Control: State-Aware Fine-Tuning for Transformer-based Models arXiv:2606.31397v1 Announce Type: new Abstract: State-based fine-tuning has emerged as a compelling alternative to weight-based adaptation for transformers, updating lightweight controls into states rather than model weights, offering substantial memory savings while retaining… 27 arXiv — Machine Learning research 1d ago Evil Spectra: How Optimisers can Amplify or Suppress Emergent Misalignment arXiv:2606.31591v1 Announce Type: new Abstract: Emergent misalignment (EM) is a recently discovered phenomenon in LLMs where fine-tuning on a narrow misaligned task, such as writing insecure code, leads to broadly misaligned behaviour on unrelated prompts. Previous work has… 11 arXiv — Machine Learning research 1d ago Nonlinearity-Aware LoRA: Structured Gate Adaptation under Low-Rank Constraints arXiv:2606.31717v1 Announce Type: new Abstract: Low-rank adaptation (LoRA) is commonly viewed as an update-space approximation to full fine-tuning, yet this view is incomplete for self-gated Transformer feed-forward networks. In gated FFNs, a low-rank residual can change not… 13 arXiv — Machine Learning research 1d ago Geometry-Preserving Orthonormal Initialization for Low-Rank Adaptation in RLVR arXiv:2606.31813v1 Announce Type: new Abstract: Low-rank adaptation (LoRA) and its variants enable parameter-efficient fine-tuning of large language models under the supervised fine-tuning (SFT) paradigm. However, their efficacy and behavior under Reinforcement learning with… 24 arXiv — NLP / Computation & Language research 1d ago Information Terra: A Narrative-Anchored Semantic-First Projection of Document Embeddings arXiv:2606.30824v1 Announce Type: cross Abstract: We introduce Information Terra, a narrative-anchored semantic-first projection that places a document corpus on an Earth-like globe whose poles are two user-chosen endpoint documents and whose prime meridian is the great-circle… 28 Hugging Face Daily Papers research 1d ago Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks Abstract Evolutionary fine-tuning enables large language models to develop cross-task problem-solving capabilities by learning from search trajectories, demonstrating improved performance on mathematical conjectures and optimization tasks. Generated by… 11 Hugging Face Daily Papers research 1d ago A Gravitational Interpretation of Fine-Tuning Reversion Abstract Post-alignment safety degradation arises from geometric properties of training history, where fine-tuning reversion follows a persistent direction defined by early training dynamics. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Fine-tuning on harmless data can partially… 35 Hugging Face Daily Papers research 1d ago RaysUp: Ultra-light Universal Feature Upsampling via Geometry-Aware Ray Representation Abstract RaysUp is a lightweight, task-agnostic feature upsampling framework that reconstructs high-resolution features using geometry-aware ray domain techniques with improved efficiency and accuracy. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Pre-trained Vision Foundation… 37 Hugging Face Daily Papers research 1d ago ZooClaw-FashionSigLIP2: Distilled Fine-tuning for Robust Fashion Retrieval Abstract A fashion-specialized vision-language model achieves superior retrieval performance through full fine-tuning with knowledge distillation and weight interpolation, outperforming existing methods on a new benchmark while addressing structural biases in existing datasets.… 32 arXiv — Machine Learning research 2d ago A Gravitational Interpretation of Fine-Tuning Reversion arXiv:2606.28525v1 Announce Type: new Abstract: Fine-tuning on harmless data can partially undo behaviors acquired earlier in training. Safety can erode under benign post-alignment updates, unlearned capabilities can re-emerge, latent traits can transfer through apparently… 27 arXiv — Machine Learning research 2d ago DLR: Zero-Inference-Cost Latent Residuals for Low-Rank Pre-Training arXiv:2606.28932v1 Announce Type: new Abstract: Large language models have driven recent progress in language and multimodal AI, yet pre-training them at scale is prohibitively expensive. Low-rank pre-training, which factorizes each weight matrix into a rank-r product to reduce… 35 arXiv — Machine Learning research 2d ago BaRA: Bayesian Adaptive Rank Allocation for Parameter-Efficient Fine-Tuning arXiv:2606.29184v1 Announce Type: new Abstract: While Low-rank adaptation (LoRA) enables highly efficient fine-tuning by constraining task-specific updates to fixed low-rank subspaces, this rigid design limits representational flexibility and often results in overconfident… 15 arXiv — Machine Learning research 2d ago Optimizer Memory Makes Shuffle Order a First-Order Source of Fine-Tuning Noise arXiv:2606.29554v1 Announce Type: new Abstract: Shuffle order can be a larger source of fine-tuning noise than a memoryless analysis predicts: fixed-clock optimizer memory makes local equal-multiset contrasts first order in the learning rate rather than second order, and the… 8 arXiv — NLP / Computation & Language research 2d ago The Heterogeneous Safety Impacts of Benign Multilingual Fine-Tuning arXiv:2606.28843v1 Announce Type: new Abstract: Fine-tuning a large language model is a ubiquitous method for enhancing its capability on a specific downstream task. However, prior work has shown that this increase in capability comes with a cost: it can increase a model's… 18 arXiv — NLP / Computation & Language research 2d ago PASTA: A Paraphrasing And Self-Training Approach for Knowledge Updating in LLMs arXiv:2606.28898v1 Announce Type: new Abstract: Knowledge updating in pre-trained Large Language Models (LLMs) remains an important challenge. While continual training provides a potential avenue for knowledge updating, it continues to present substantial technical difficulties.… 20 arXiv — NLP / Computation & Language research 2d ago Fine-Tuning General-Purpose Large Language Models for Agricultural Applications:A Reproducible Framework and Evaluation Protocol Based on Qwen3-8B arXiv:2606.28992v1 Announce Type: new Abstract: General-purpose large language models (LLMs) have demonstrated strong abilities in opendomain question answering, information extraction, and text generation. Agricultural applications, however, are domain-specific,… 20 arXiv — NLP / Computation & Language research 2d ago Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks arXiv:2606.29082v1 Announce Type: new Abstract: Would experience designing faster GPU kernels also help close in on a long-standing open mathematical conjecture? Large Language Models (LLMs) integrated into evolutionary search have recently produced state-of-the-art solutions on… 4 arXiv — NLP / Computation & Language research 2d ago Do We Still Need Fine Tuning? Turkish Sentiment Analysis in the Era of Large Language Model arXiv:2606.29614v1 Announce Type: new Abstract: This study examines whether supervised fine-tuning remains necessary for Turkish sentiment analysis in the era of large language models. We compare classical machine learning methods, fine-tuned pretrained language models, and… 35 arXiv — NLP / Computation & Language research 2d ago SrDetection: A Self-Referential Framework for Data Leakage Detection in Code Large Language Models arXiv:2606.29815v1 Announce Type: new Abstract: Evaluating code large language models (Code LLMs) requires reliable detection of data leakage, where benchmark performance is artificially inflated by exposure to benchmark data during pre-training. Existing approaches either… 7 Hugging Face Daily Papers research 2d ago Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent Abstract Agents-A1, a 35B Mixture-of-Experts Agentic Model, achieves trillion-parameter-level performance through long-horizon trajectory scaling and heterogeneous agent ability scaling via a three-stage training approach involving supervised fine-tuning, domain-level teacher… 28 Vercel — AI dev-tools 2d ago Expanded Audit Log coverage, now delivered through Vercel Drains Audit Logs now capture 400+ unique team activity events , giving teams broader coverage for security reviews, compliance workflows, and investigations. With Vercel Drains support, teams can export those events to custom HTTP endpoints or Amazon S3, replacing Custom SIEM Log… 6 r/MachineLearning community 2d ago I'm trying to implement CALM paper, and I have some questions. [P] Hello, I'm trying to implement the Pocket TTS by kyutai-labs represented by this paper . Since they have didn't released the training/fine-tuning code. I'm trying to implement it on my own for learning some stuff. I have read the paper, tried to implement it with much more… 34 r/LocalLLaMA community 3d ago Update: First Manual Results from Testing Procedural Skill Transfer in Small Models Yesterday I posted an idea for testing whether a large model can transfer some of its procedural skill to a smaller model without fine-tuning. The short version of the idea was this: Small models are often not completely lacking knowledge. They know the syntax. They know the… 18 arXiv — Machine Learning research 3d ago PEBS: Per-rater Empirical-Bayes Shrinkage for RLHF Reward-Model Calibration arXiv:2606.27578v1 Announce Type: new Abstract: Reward models for Reinforcement Learning from Human Feedback (RLHF) pool preferences across thousands of annotators and fit one global affine calibrator, collapsing raters with systematically different rating-scale offsets and… 36 arXiv — Machine Learning research 3d ago Retroactive Advantage Correction: Closed-Form V-Trace Bias Correction for Delay-Aware RLHF arXiv:2606.27580v1 Announce Type: new Abstract: Reinforcement learning from human feedback (RLHF) in production does not always have a synchronous reward signal. Code-execution verifiers, slow judge ensembles, and queued human review can return several gradient steps after the… 14 arXiv — Machine Learning research 3d ago Two-Stage Fine-Tuning for Protein Sequence Generation with Targeted Amino-Acid Composition arXiv:2606.27939v1 Announce Type: new Abstract: Protein language models are standard priors for biological sequence generation, but steering them toward explicit distributional design targets remains largely unexplored. We study a constrained protein generation problem in which… 24 arXiv — Machine Learning research 3d ago When One Adapter Speaks for Many: Discovering Low-Rank Redundancy in Continual Fine-Tuning arXiv:2606.28117v1 Announce Type: new Abstract: Low-Rank Adaptation (LoRA) has become the standard tool for parameter-efficient fine-tuning of large pretrained models. When applied sequentially across tasks in Continual Learning (CL), the standard assumption is that each new… 38 arXiv — Machine Learning research 3d ago Qwen-Image-2.0-RL Technical Report arXiv:2606.27608v1 Announce Type: cross Abstract: We present Qwen-Image-2.0-RL, a post-training pipeline that applies reinforcement learning from human feedback (RLHF) and on-policy distillation (OPD) to improve both the visual quality and instruction-following capability of the… 34 arXiv — NLP / Computation & Language research 3d ago Causal Connections: Leveraging Multilingual Fine-Tuning for Financial QA@FinCausal 2026 arXiv:2606.27446v1 Announce Type: new Abstract: This paper describes team HSA_CORAL's submission to the FinCausal 2026 shared task on extracting cause-effect relations from financial narratives via extractive question answering in English and Spanish. We compare three modeling… 4 arXiv — NLP / Computation & Language research 3d ago Low-Agreeableness Persona Conditioning for Safe LLM Fine-Tuning arXiv:2606.27709v1 Announce Type: new Abstract: Recent work has shown that fine-tuning large language models (LLMs) for social warmth degrades factual reliability and increases sycophancy. We investigate a related but distinct failure mode: warmth fine-tuning also weakens… 22 arXiv — NLP / Computation & Language research 3d ago HPRO: Hierarchical Progressive Reward Optimization via Preference Extraction for Emotional Text-to-Speech arXiv:2606.28249v1 Announce Type: cross Abstract: Recently, Large Language Model (LLM)-based Text-to-Speech (TTS) models have achieved remarkable naturalness. However, the standard Supervised Fine-Tuning paradigm often converges to statistically averaged prosody, limiting… 20 arXiv — NLP / Computation & Language research 3d ago Continual Memorization of Factoids in Language Models arXiv:2411.07175v3 Announce Type: replace Abstract: As new knowledge rapidly accumulates, language models (LMs) with pretrained knowledge quickly become obsolete. A common approach to updating LMs is fine-tuning them directly on new knowledge. However, recent studies have shown… 27 r/LocalLLaMA community 3d ago MLX Fine-Tune Example Guide A Local MLX Fine-Tuning Experiment Just finished a local LoRA fine-tune of a 7B instruction model on Apple Silicon, via MLX, teaching it a high-fantasy literary register (Gene Wolfe and Tolkien). This is a more rigorous version with more data of something I tried two years ago… 14 Page 1 of 9 · 450 articles Older →