News / #reasoning Tag Reasoning 500 articles archived under #reasoning · RSS Sign in to follow Hugging Face Daily Papers research 21d ago Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions Abstract A teacher-student framework decouples complex reasoning from efficient reward deployment in text-to-image training, achieving superior preference accuracy and optimization performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reward models are central to… 22 Hugging Face Daily Papers research 21d ago Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models Abstract Embodied-R1.5 is a unified embodied foundation model that integrates embodied reasoning capabilities and achieves state-of-the-art performance on embodied vision-language benchmarks through a multi-task balanced reinforcement learning approach. Generated by… 35 Hugging Face Daily Papers research 21d ago InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning Abstract InternVideo3 enhances long-horizon multimodal tasks through Multimodal Contextual Reasoning and efficient attention mechanisms, demonstrating strong performance on video understanding benchmarks and video agent capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 18 Hugging Face Daily Papers research 22d ago Decentralized Multi-Agent Systems with Shared Context Abstract Decentralized Language Models (DeLM) framework enables scalable large language model reasoning through parallel agents that asynchronously coordinate via a shared verified context, improving performance and efficiency over centralized approaches. Generated by… 25 Hugging Face Daily Papers research 22d ago The Role of Feedback Alignment in Self-Distillation Abstract Self-distillation effectiveness depends on structural alignment between feedback and solver reasoning, with step-aligned critique outperforming binary rewards and reference solutions by targeting specific reasoning failures. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 32 Hugging Face Daily Papers research 22d ago Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution Abstract Role-Agent framework enables LLM agents to function as both agent and environment through bootstrapped co-evolution, improving performance via environment-aware reasoning and targeted practice. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Although Large Language Model… 33 Hugging Face Daily Papers research 22d ago MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism Abstract MemDreamer addresses long-video understanding challenges by decoupling perception and reasoning through hierarchical graph memory and agentic exploration, achieving state-of-the-art performance with reduced computational overhead. Generated by… 33 Hugging Face Daily Papers research 22d ago Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It Abstract Chain-of-thought supervised fine-tuning degrades long-context recall in hybrid linear-attention models by biasing attention gradients toward short-range patterns, but a training-free method called QK-Restore can restore long-context capabilities by reverting query-key… 8 arXiv — Machine Learning research 22d ago Rotate2Think: Geometric Priming via Orthogonal Rotation to Improve Language Model Reasoning arXiv:2606.09873v1 Announce Type: new Abstract: Reasoning models achieve strong performance on challenging tasks by generating explicit intermediate reasoning traces before producing a final answer. Yet the internal structure of representation space when reasoning remains poorly… 29 arXiv — Machine Learning research 22d ago TD-Grokking: Learning from Zero-Reward Problems by Training-Time Decomposition arXiv:2606.09883v1 Announce Type: new Abstract: Large language models (LLMs) have made remarkable progress in reasoning tasks, largely driven by post-training paradigms, especially reinforcement learning with verifiable rewards (RLVR). However, a critical bottleneck persists:… 25 arXiv — NLP / Computation & Language research 22d ago SocraticPO: Policy Optimization via Interactive Guidance arXiv:2606.09887v1 Announce Type: cross Abstract: Reinforcement learning (RL) for large language models usually supervises reasoning with scalar outcome rewards, such as binary correctness. Such rewards provide an optimization direction but rarely explain how a model should… 5 arXiv — Machine Learning research 22d ago IntentKV: Cross-Turn Intent-Aware KV Cache Pruning for Agent Inference arXiv:2606.09916v1 Announce Type: new Abstract: Multi-turn LLM agents fan short queries into long trajectories of tool calls, search results, and intermediate reasoning. Both KV memory and KV read bandwidth grow by orders of magnitude across a single trajectory, making the… 24 arXiv — Machine Learning research 22d ago Sample Where You Struggle: Sharpening Base Model Reasoning via Entropy-Guided Power Sampling arXiv:2606.09926v1 Announce Type: new Abstract: Sampling from the sequence-level power distribution $p^\alpha$ elicits RL-level reasoning from base language models without any parameter updates, but the standard Metropolis--Hastings (MH), a Markov Chain Monte Carlo (MCMC)… 20 arXiv — NLP / Computation & Language research 22d ago RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference arXiv:2606.09937v1 Announce Type: cross Abstract: We introduce RKSC (Reasoning-Aware KV Cache Sharing), a training-free inference framework that eliminates two structural redundancies in multi-branch LLM reasoning pipelines. ASKS (Attention-Similarity KV Sharing) computes the… 32 arXiv — Machine Learning research 22d ago Dropout-GRPO: Variational Stochasticity for Continuous Latent Reasoning arXiv:2606.10184v1 Announce Type: new Abstract: Group Relative Policy Optimization (GRPO) relies on the diversity of $K$ rollouts within each group; otherwise, the group-mean advantage $A^{(k)} = r^{(k)} - \mu_r$ collapses to zero. This presents a structural challenge for… 9 arXiv — Machine Learning research 22d ago Beyond Absolute Imitation: Anchored Residual Guidance for Privileged On-Policy Distillation arXiv:2606.10385v1 Announce Type: new Abstract: On-policy distillation (OPD) has demonstrated strong empirical gains in enhancing complex reasoning in LLMs by aligning a student model with a teacher's predictive distribution over the student's own trajectories. An emerging… 36 arXiv — NLP / Computation & Language research 22d ago Using Probabilistic Programs to Train Inductive Reasoning in Large Language Models arXiv:2606.09856v1 Announce Type: new Abstract: Post-training Large Language Models (LLMs) for reasoning typically focuses on deductive tasks such as mathematics and coding where correctness is verifiable. Yet, many real-world reasoning problems are inductive: agents must infer… 36 arXiv — NLP / Computation & Language research 22d ago The Confident Liar: Diagnosing Multi-Agent Debate with Log-Probabilities and LLM-as-Judge arXiv:2606.10296v1 Announce Type: new Abstract: Multi-agent debate systems are typically evaluated only on whether the final answer is correct, overlooking the quality of the intermediate reasoning that debate is designed to produce. This paper studies the relationship between… 16 arXiv — NLP / Computation & Language research 22d ago Early-Token Confidence Predicts Reasoning Quality in Multi-Agent LLM Debate arXiv:2606.10307v1 Announce Type: new Abstract: Evaluating reasoning quality in multi-agent LLM systems is challenging, especially for open-ended tasks without reference answers. We investigate whether intrinsic confidence signals, token-level log-probabilities from decoding,… 14 arXiv — NLP / Computation & Language research 22d ago TabClaw: An Interactive and Self-Evolving Agent for Spreadsheet Manipulation and Table Reasoning arXiv:2606.10316v1 Announce Type: new Abstract: Spreadsheets and tables are widely used representations for structured data analysis, but effective analysis still requires substantial manual effort and domain expertise. Recent large language model (LLM) agents can automate parts… 31 arXiv — NLP / Computation & Language research 22d ago KCSAT-ML: Probing Reasoning Models with Nationwide-Cohort Human Difficulty arXiv:2606.10403v1 Announce Type: new Abstract: Math reasoning benchmarks have proliferated, yet most lack a per-item difficulty signal grounded in actual human performance. We introduce KCSAT-ML, a decade (2014-2025) of Korean College Scholastic Ability Test (KCSAT; Suneung)… 34 arXiv — NLP / Computation & Language research 22d ago WebChallenger: A Reliable and Efficient Generalist Web Agent arXiv:2606.10423v1 Announce Type: new Abstract: Autonomous web navigation remains challenging for LLM agents, and the strongest generalist systems rely on proprietary reasoning models whose inference cost is prohibitive for the repetitive tasks where such agents would be most… 31 arXiv — NLP / Computation & Language research 22d ago REAL: A Reasoning-Enhanced Graph Framework for Long-Term Memory Management of LLMs arXiv:2606.10694v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly expected to interact with users over long time horizons. However, due to their finite context window, LLMs cannot retain all past interactions, making long-term memory management… 22 arXiv — NLP / Computation & Language research 22d ago Dep-LLM: Training-Free Depression Diagnosis via Evidence-Guided Structured Multi-factor with Reliable LLM Reasoning arXiv:2606.10796v1 Announce Type: new Abstract: Automatic Depression Detection (ADD) from clinical interviews is a pivotal task in computational mental health, yet it remains challenging due to two critical obstacles: 1) difficulty in modeling complex but sparsely distributed… 5 arXiv — NLP / Computation & Language research 22d ago Does Reasoning Preserve Alignment? On the Trustworthiness of Large Reasoning Models arXiv:2606.11046v1 Announce Type: new Abstract: Instruction-tuned LLMs are increasingly converted into reasoning models through post-training to improve multi-step task performance. This conversion is usually optimized for reasoning accuracy, without explicitly preserving the… 18 arXiv — NLP / Computation & Language research 22d ago Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It arXiv:2606.11052v1 Announce Type: new Abstract: Chain-of-thought (CoT) supervised fine-tuning (SFT) is widely adopted to improve reasoning ability, yet we find that it systematically degrades long-context recall in hybrid linear-attention models. Across architectures including… 26 arXiv — NLP / Computation & Language research 22d ago T1-Bench: Benchmarking Multi-Scenario Agents in Real-World Domains arXiv:2606.11070v1 Announce Type: new Abstract: Recent advances in reasoning and tool-calling capabilities of large language models (LLMs) have enabled increasingly capable agentic systems. However, existing benchmarks remain limited in task complexity, realism, and domain… 15 arXiv — NLP / Computation & Language research 22d ago RealMath-Eval: Why SOTA Judges Struggle with Real Human Reasoning arXiv:2606.10254v1 Announce Type: cross Abstract: While Large Language Models (LLMs) have achieved near-perfect performance in \emph{solving} high-school mathematics, their ability to \emph{evaluate} the diverse reasoning processes of real human students remains under-examined.… 19 arXiv — NLP / Computation & Language research 22d ago Decoupling Thought from Speech: Knowledge-Grounded Counterfactual Reasoning for Resilient Multi-Agent Argumentation arXiv:2606.10475v1 Announce Type: cross Abstract: Multi-agent debate frameworks have been shown to improve large language model performance in convergent tasks, but they are currently optimized in a way that heavily favors final output accuracy rather than stability of the… 18 arXiv — NLP / Computation & Language research 22d ago How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs arXiv:2606.10646v1 Announce Type: cross Abstract: Token-level credit assignment remains a key obstacle for reinforcement learning (RL) in large language models (LLMs), where RL recipes typically treat all tokens equally, failing to distinguish decisive reasoning steps from… 5 Hugging Face Daily Papers research 22d ago How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs Abstract FlowTracer is an RL framework that uses attention-induced graphs to trace reasoning flows and assign token-level credit based on global information propagation structures. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Token-level credit assignment remains a key obstacle… 26 Hugging Face Daily Papers research 22d ago When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning Models Abstract Multi-turn reasoning models exhibit hidden alignment failures that are masked by traditional evaluation methods, revealing vulnerabilities through a trace-level diagnostic framework that identifies distinct failure modes including context-injection failures. Generated… 12 r/LocalLLaMA community 22d ago Fine-tuned Qwen2.5-7B to 96% of Claude Haiku on a domain-specific task using ~$3 of API calls and zero human labelers Built a decision-reasoning engine (Orlog) and wanted to fine-tune a local model for it instead of paying per-call forever. The method (DV-DPO): Run a 3-voice council on each question, produce a synthesis Cross-examine: losing voices challenge the synthesis If synthesis gets… 35 Hugging Face Daily Papers research 22d ago Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense Abstract SCOUT framework dynamically allocates prompt-injection detection by predicting detector reliability and latency, improving safety and efficiency over fixed single-detector approaches. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Prompt-injection detectors are… 30 Hugging Face Daily Papers research 23d ago SDR: Set-Distance Rewards for Radiology Report Generation Abstract Set-based rewards using embedding distances improve chest X-ray report generation by enabling effective post-training and test-time selection without requiring causal reasoning structures. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reinforcement learning with… 14 Google DeepMind official-blog 23d ago Introducing Gemma 4 12B: a unified, encoder-free multimodal model Introducing Gemma 4 12B: a unified, encoder-free multimodal model Jun 03, 2026 · Share x.com Facebook LinkedIn Mail Gemma 4 12B is designed to bring high-performance multimodal intelligence directly to your laptop, combining mobile-first efficiency with advanced reasoning.… 17 Hugging Face Daily Papers research 23d ago Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning Abstract Skill-3D framework enables agents to learn scene-aware skills through self-evolving memory and skill libraries, improving tool utilization in 3D spatial reasoning tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct This paper explores agentic 3D spatial understanding,… 22 Hugging Face Daily Papers research 23d ago Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation? Abstract Large language models can improve translation for low-resource languages through structured linguistic reasoning traces, with the most significant benefits occurring during inference rather than training. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language… 30 Hugging Face Daily Papers research 23d ago OmniCap-IF: Benchmarking and Improving Instruction Following Abilities for Omni-Video Captioning Abstract OmniCap-IF is introduced as the first comprehensive benchmark for evaluating instruction-following capabilities in omni-modal captioning, revealing significant performance disparities and a format-content tradeoff in multi-modal reasoning. Generated by… 5 Hugging Face Daily Papers research 23d ago Optical Reasoning: Rethinking Images as an Expressive Reasoning Medium Beyond Text Abstract Optical reasoning uses images as a standalone reasoning medium for language and multimodal tasks, achieving higher token efficiency than traditional text-based approaches. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Chain-of-Thought (CoT) improves the performance of… 27 Hugging Face Daily Papers research 23d ago Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short Abstract Reasoning Arena improves reinforcement learning with verifiable rewards by using trace tournaments and Bradley-Terry models to generate meaningful gradients from non-diverse reward groups, resulting in faster training and better reasoning performance. Generated by… 15 Hugging Face Daily Papers research 23d ago Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory Abstract SkeMex is a self-evolving framework that enhances medical agents through structured skill memory, improving long-term clinical reasoning by distinguishing useful experiences and governing memory retention based on contextual utility. Generated by… 32 Hugging Face Daily Papers research 23d ago Cosine Misleads: Auxiliary Losses Reshape Vision Language Models, Not Their Latents Abstract Research challenges the conventional wisdom in latent visual reasoning by demonstrating that cosine alignment between supervised latents and visual targets negatively correlates with model accuracy, while revealing that answers are decoded downstream from latents rather… 24 Hugging Face Daily Papers research 23d ago DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning Abstract A multi-agent framework for deep research tasks that addresses planning, evidence acquisition, and report synthesis through decoupled components and dynamic optimization mechanisms. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Deep Research (DR) has emerged as a new… 38 arXiv — Machine Learning research 23d ago Sample-Efficient Post-Training for LEGO Spatial-Physics Reasoning arXiv:2606.07602v1 Announce Type: new Abstract: LLM-based LEGO assembly generation requires both semantic grounding and physical feasibility. We identify a data-induced failure mode, PhysHack, in which the assemblies satisfy physical-validity constraints while producing… 12 arXiv — Machine Learning research 23d ago MetaEvo: A Meta-Optimization Framework for Experience-Driven Agent Evolution arXiv:2606.07603v1 Announce Type: new Abstract: Large language models (LLMs) exhibit strong reasoning capabilities, yet most LLM-based agents are statically deployed and unable to improve through task interactions. Existing experience-driven methods often rely on memory or… 32 arXiv — Machine Learning research 23d ago Adversarial Robustness of Activation Steering in Large Language Models arXiv:2606.07696v1 Announce Type: new Abstract: Activation steering has become a popular training-free method to control LLM behavior by injecting precomputed direction vectors into the model's residual stream at inference time. Yet its robustness to realistic input variation… 24 arXiv — Machine Learning research 23d ago Strained Coherence: A Pre-Failure Signal in Coding Agent Execution Trajectories arXiv:2606.07889v1 Announce Type: new Abstract: LLM-based coding agents sometimes acknowledge a problem in their own reasoning and then proceed anyway. We call this pattern strained coherence: a safety-relevant failure mode in which an agent has information that should change… 31 arXiv — Machine Learning research 23d ago The Easy, the Hard, and the Learnable: Confidence and Difficulty-Adaptive Policy Optimization for LLM Reasoning arXiv:2606.07950v1 Announce Type: new Abstract: RL with verifiable rewards can substantially improve LLM reasoning, yet standard GRPO-style training often treats easy, hard, and learnable questions alike through uniform sampling and weighting, leading to inefficient compute… 31 arXiv — Machine Learning research 23d ago Enhancing AI Interpretability and Safety through Localised Architectures arXiv:2606.07998v1 Announce Type: new Abstract: Recent advances in generative AI, especially powerful Large Language Models (LLMs) and Large Reasoning Models (LRMs), raise concerns over the interpretability, safety and sustainability of these large and opaque AI models. The… 8 Page 8 of 10 · 500 articles ← Newer Older →