Tag

Reasoning

500 articles archived under #reasoning · RSS

Hugging Face Daily Papers research 21d ago

Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions

Abstract A teacher-student framework decouples complex reasoning from efficient reward deployment in text-to-image training, achieving superior preference accuracy and optimization performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reward models are central to…

22
Hugging Face Daily Papers research 21d ago

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

Abstract Embodied-R1.5 is a unified embodied foundation model that integrates embodied reasoning capabilities and achieves state-of-the-art performance on embodied vision-language benchmarks through a multi-task balanced reinforcement learning approach. Generated by…

35
Hugging Face Daily Papers research 21d ago

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

Abstract InternVideo3 enhances long-horizon multimodal tasks through Multimodal Contextual Reasoning and efficient attention mechanisms, demonstrating strong performance on video understanding benchmarks and video agent capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

18
Hugging Face Daily Papers research 22d ago

Decentralized Multi-Agent Systems with Shared Context

Abstract Decentralized Language Models (DeLM) framework enables scalable large language model reasoning through parallel agents that asynchronously coordinate via a shared verified context, improving performance and efficiency over centralized approaches. Generated by…

25
Hugging Face Daily Papers research 22d ago

The Role of Feedback Alignment in Self-Distillation

Abstract Self-distillation effectiveness depends on structural alignment between feedback and solver reasoning, with step-aligned critique outperforming binary rewards and reference solutions by targeting specific reasoning failures. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

32
Hugging Face Daily Papers research 22d ago

Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution

Abstract Role-Agent framework enables LLM agents to function as both agent and environment through bootstrapped co-evolution, improving performance via environment-aware reasoning and targeted practice. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Although Large Language Model…

33
Hugging Face Daily Papers research 22d ago

MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism

Abstract MemDreamer addresses long-video understanding challenges by decoupling perception and reasoning through hierarchical graph memory and agentic exploration, achieving state-of-the-art performance with reduced computational overhead. Generated by…

33
Hugging Face Daily Papers research 22d ago

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Abstract Chain-of-thought supervised fine-tuning degrades long-context recall in hybrid linear-attention models by biasing attention gradients toward short-range patterns, but a training-free method called QK-Restore can restore long-context capabilities by reverting query-key…

8
arXiv — Machine Learning research 22d ago

Rotate2Think: Geometric Priming via Orthogonal Rotation to Improve Language Model Reasoning

arXiv:2606.09873v1 Announce Type: new Abstract: Reasoning models achieve strong performance on challenging tasks by generating explicit intermediate reasoning traces before producing a final answer. Yet the internal structure of representation space when reasoning remains poorly…

29
arXiv — Machine Learning research 22d ago

TD-Grokking: Learning from Zero-Reward Problems by Training-Time Decomposition

arXiv:2606.09883v1 Announce Type: new Abstract: Large language models (LLMs) have made remarkable progress in reasoning tasks, largely driven by post-training paradigms, especially reinforcement learning with verifiable rewards (RLVR). However, a critical bottleneck persists:…

25
arXiv — NLP / Computation & Language research 22d ago

SocraticPO: Policy Optimization via Interactive Guidance

arXiv:2606.09887v1 Announce Type: cross Abstract: Reinforcement learning (RL) for large language models usually supervises reasoning with scalar outcome rewards, such as binary correctness. Such rewards provide an optimization direction but rarely explain how a model should…

5
arXiv — Machine Learning research 22d ago

IntentKV: Cross-Turn Intent-Aware KV Cache Pruning for Agent Inference

arXiv:2606.09916v1 Announce Type: new Abstract: Multi-turn LLM agents fan short queries into long trajectories of tool calls, search results, and intermediate reasoning. Both KV memory and KV read bandwidth grow by orders of magnitude across a single trajectory, making the…

24
arXiv — Machine Learning research 22d ago

Sample Where You Struggle: Sharpening Base Model Reasoning via Entropy-Guided Power Sampling

arXiv:2606.09926v1 Announce Type: new Abstract: Sampling from the sequence-level power distribution $p^\alpha$ elicits RL-level reasoning from base language models without any parameter updates, but the standard Metropolis--Hastings (MH), a Markov Chain Monte Carlo (MCMC)…

20
arXiv — NLP / Computation & Language research 22d ago

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

arXiv:2606.09937v1 Announce Type: cross Abstract: We introduce RKSC (Reasoning-Aware KV Cache Sharing), a training-free inference framework that eliminates two structural redundancies in multi-branch LLM reasoning pipelines. ASKS (Attention-Similarity KV Sharing) computes the…

32
arXiv — Machine Learning research 22d ago

Dropout-GRPO: Variational Stochasticity for Continuous Latent Reasoning

arXiv:2606.10184v1 Announce Type: new Abstract: Group Relative Policy Optimization (GRPO) relies on the diversity of $K$ rollouts within each group; otherwise, the group-mean advantage $A^{(k)} = r^{(k)} - \mu_r$ collapses to zero. This presents a structural challenge for…

9
arXiv — Machine Learning research 22d ago

Beyond Absolute Imitation: Anchored Residual Guidance for Privileged On-Policy Distillation

arXiv:2606.10385v1 Announce Type: new Abstract: On-policy distillation (OPD) has demonstrated strong empirical gains in enhancing complex reasoning in LLMs by aligning a student model with a teacher's predictive distribution over the student's own trajectories. An emerging…

36
arXiv — NLP / Computation & Language research 22d ago

Using Probabilistic Programs to Train Inductive Reasoning in Large Language Models

arXiv:2606.09856v1 Announce Type: new Abstract: Post-training Large Language Models (LLMs) for reasoning typically focuses on deductive tasks such as mathematics and coding where correctness is verifiable. Yet, many real-world reasoning problems are inductive: agents must infer…

36
arXiv — NLP / Computation & Language research 22d ago

The Confident Liar: Diagnosing Multi-Agent Debate with Log-Probabilities and LLM-as-Judge

arXiv:2606.10296v1 Announce Type: new Abstract: Multi-agent debate systems are typically evaluated only on whether the final answer is correct, overlooking the quality of the intermediate reasoning that debate is designed to produce. This paper studies the relationship between…

16
arXiv — NLP / Computation & Language research 22d ago

Early-Token Confidence Predicts Reasoning Quality in Multi-Agent LLM Debate

arXiv:2606.10307v1 Announce Type: new Abstract: Evaluating reasoning quality in multi-agent LLM systems is challenging, especially for open-ended tasks without reference answers. We investigate whether intrinsic confidence signals, token-level log-probabilities from decoding,…

14
arXiv — NLP / Computation & Language research 22d ago

TabClaw: An Interactive and Self-Evolving Agent for Spreadsheet Manipulation and Table Reasoning

arXiv:2606.10316v1 Announce Type: new Abstract: Spreadsheets and tables are widely used representations for structured data analysis, but effective analysis still requires substantial manual effort and domain expertise. Recent large language model (LLM) agents can automate parts…

31
arXiv — NLP / Computation & Language research 22d ago

KCSAT-ML: Probing Reasoning Models with Nationwide-Cohort Human Difficulty

arXiv:2606.10403v1 Announce Type: new Abstract: Math reasoning benchmarks have proliferated, yet most lack a per-item difficulty signal grounded in actual human performance. We introduce KCSAT-ML, a decade (2014-2025) of Korean College Scholastic Ability Test (KCSAT; Suneung)…

34
arXiv — NLP / Computation & Language research 22d ago

WebChallenger: A Reliable and Efficient Generalist Web Agent

arXiv:2606.10423v1 Announce Type: new Abstract: Autonomous web navigation remains challenging for LLM agents, and the strongest generalist systems rely on proprietary reasoning models whose inference cost is prohibitive for the repetitive tasks where such agents would be most…

31
arXiv — NLP / Computation & Language research 22d ago

REAL: A Reasoning-Enhanced Graph Framework for Long-Term Memory Management of LLMs

arXiv:2606.10694v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly expected to interact with users over long time horizons. However, due to their finite context window, LLMs cannot retain all past interactions, making long-term memory management…

22
arXiv — NLP / Computation & Language research 22d ago

Dep-LLM: Training-Free Depression Diagnosis via Evidence-Guided Structured Multi-factor with Reliable LLM Reasoning

arXiv:2606.10796v1 Announce Type: new Abstract: Automatic Depression Detection (ADD) from clinical interviews is a pivotal task in computational mental health, yet it remains challenging due to two critical obstacles: 1) difficulty in modeling complex but sparsely distributed…

5
arXiv — NLP / Computation & Language research 22d ago

Does Reasoning Preserve Alignment? On the Trustworthiness of Large Reasoning Models

arXiv:2606.11046v1 Announce Type: new Abstract: Instruction-tuned LLMs are increasingly converted into reasoning models through post-training to improve multi-step task performance. This conversion is usually optimized for reasoning accuracy, without explicitly preserving the…

18
arXiv — NLP / Computation & Language research 22d ago

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

arXiv:2606.11052v1 Announce Type: new Abstract: Chain-of-thought (CoT) supervised fine-tuning (SFT) is widely adopted to improve reasoning ability, yet we find that it systematically degrades long-context recall in hybrid linear-attention models. Across architectures including…

26
arXiv — NLP / Computation & Language research 22d ago

T1-Bench: Benchmarking Multi-Scenario Agents in Real-World Domains

arXiv:2606.11070v1 Announce Type: new Abstract: Recent advances in reasoning and tool-calling capabilities of large language models (LLMs) have enabled increasingly capable agentic systems. However, existing benchmarks remain limited in task complexity, realism, and domain…

15
arXiv — NLP / Computation & Language research 22d ago

RealMath-Eval: Why SOTA Judges Struggle with Real Human Reasoning

arXiv:2606.10254v1 Announce Type: cross Abstract: While Large Language Models (LLMs) have achieved near-perfect performance in \emph{solving} high-school mathematics, their ability to \emph{evaluate} the diverse reasoning processes of real human students remains under-examined.…

19
arXiv — NLP / Computation & Language research 22d ago

Decoupling Thought from Speech: Knowledge-Grounded Counterfactual Reasoning for Resilient Multi-Agent Argumentation

arXiv:2606.10475v1 Announce Type: cross Abstract: Multi-agent debate frameworks have been shown to improve large language model performance in convergent tasks, but they are currently optimized in a way that heavily favors final output accuracy rather than stability of the…

18
arXiv — NLP / Computation & Language research 22d ago

How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs

arXiv:2606.10646v1 Announce Type: cross Abstract: Token-level credit assignment remains a key obstacle for reinforcement learning (RL) in large language models (LLMs), where RL recipes typically treat all tokens equally, failing to distinguish decisive reasoning steps from…

5
Hugging Face Daily Papers research 22d ago

How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs

Abstract FlowTracer is an RL framework that uses attention-induced graphs to trace reasoning flows and assign token-level credit based on global information propagation structures. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Token-level credit assignment remains a key obstacle…

26
Hugging Face Daily Papers research 22d ago

When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning Models

Abstract Multi-turn reasoning models exhibit hidden alignment failures that are masked by traditional evaluation methods, revealing vulnerabilities through a trace-level diagnostic framework that identifies distinct failure modes including context-injection failures. Generated…

12
r/LocalLLaMA community 22d ago

Fine-tuned Qwen2.5-7B to 96% of Claude Haiku on a domain-specific task using ~$3 of API calls and zero human labelers

Built a decision-reasoning engine (Orlog) and wanted to fine-tune a local model for it instead of paying per-call forever. The method (DV-DPO): Run a 3-voice council on each question, produce a synthesis Cross-examine: losing voices challenge the synthesis If synthesis gets…

35
Hugging Face Daily Papers research 22d ago

Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense

Abstract SCOUT framework dynamically allocates prompt-injection detection by predicting detector reliability and latency, improving safety and efficiency over fixed single-detector approaches. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Prompt-injection detectors are…

30
Hugging Face Daily Papers research 23d ago

SDR: Set-Distance Rewards for Radiology Report Generation

Abstract Set-based rewards using embedding distances improve chest X-ray report generation by enabling effective post-training and test-time selection without requiring causal reasoning structures. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reinforcement learning with…

14
Google DeepMind official-blog 23d ago

Introducing Gemma 4 12B: a unified, encoder-free multimodal model

Introducing Gemma 4 12B: a unified, encoder-free multimodal model Jun 03, 2026 · Share x.com Facebook LinkedIn Mail Gemma 4 12B is designed to bring high-performance multimodal intelligence directly to your laptop, combining mobile-first efficiency with advanced reasoning.…

17
Hugging Face Daily Papers research 23d ago

Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning

Abstract Skill-3D framework enables agents to learn scene-aware skills through self-evolving memory and skill libraries, improving tool utilization in 3D spatial reasoning tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct This paper explores agentic 3D spatial understanding,…

22
Hugging Face Daily Papers research 23d ago

Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation?

Abstract Large language models can improve translation for low-resource languages through structured linguistic reasoning traces, with the most significant benefits occurring during inference rather than training. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language…

30
Hugging Face Daily Papers research 23d ago

OmniCap-IF: Benchmarking and Improving Instruction Following Abilities for Omni-Video Captioning

Abstract OmniCap-IF is introduced as the first comprehensive benchmark for evaluating instruction-following capabilities in omni-modal captioning, revealing significant performance disparities and a format-content tradeoff in multi-modal reasoning. Generated by…

5
Hugging Face Daily Papers research 23d ago

Optical Reasoning: Rethinking Images as an Expressive Reasoning Medium Beyond Text

Abstract Optical reasoning uses images as a standalone reasoning medium for language and multimodal tasks, achieving higher token efficiency than traditional text-based approaches. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Chain-of-Thought (CoT) improves the performance of…

27
Hugging Face Daily Papers research 23d ago

Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short

Abstract Reasoning Arena improves reinforcement learning with verifiable rewards by using trace tournaments and Bradley-Terry models to generate meaningful gradients from non-diverse reward groups, resulting in faster training and better reasoning performance. Generated by…

15
Hugging Face Daily Papers research 23d ago

Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory

Abstract SkeMex is a self-evolving framework that enhances medical agents through structured skill memory, improving long-term clinical reasoning by distinguishing useful experiences and governing memory retention based on contextual utility. Generated by…

32
Hugging Face Daily Papers research 23d ago

Cosine Misleads: Auxiliary Losses Reshape Vision Language Models, Not Their Latents

Abstract Research challenges the conventional wisdom in latent visual reasoning by demonstrating that cosine alignment between supervised latents and visual targets negatively correlates with model accuracy, while revealing that answers are decoded downstream from latents rather…

24
Hugging Face Daily Papers research 23d ago

DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning

Abstract A multi-agent framework for deep research tasks that addresses planning, evidence acquisition, and report synthesis through decoupled components and dynamic optimization mechanisms. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Deep Research (DR) has emerged as a new…

38
arXiv — Machine Learning research 23d ago

Sample-Efficient Post-Training for LEGO Spatial-Physics Reasoning

arXiv:2606.07602v1 Announce Type: new Abstract: LLM-based LEGO assembly generation requires both semantic grounding and physical feasibility. We identify a data-induced failure mode, PhysHack, in which the assemblies satisfy physical-validity constraints while producing…

12
arXiv — Machine Learning research 23d ago

MetaEvo: A Meta-Optimization Framework for Experience-Driven Agent Evolution

arXiv:2606.07603v1 Announce Type: new Abstract: Large language models (LLMs) exhibit strong reasoning capabilities, yet most LLM-based agents are statically deployed and unable to improve through task interactions. Existing experience-driven methods often rely on memory or…

32
arXiv — Machine Learning research 23d ago

Adversarial Robustness of Activation Steering in Large Language Models

arXiv:2606.07696v1 Announce Type: new Abstract: Activation steering has become a popular training-free method to control LLM behavior by injecting precomputed direction vectors into the model's residual stream at inference time. Yet its robustness to realistic input variation…

24
arXiv — Machine Learning research 23d ago

Strained Coherence: A Pre-Failure Signal in Coding Agent Execution Trajectories

arXiv:2606.07889v1 Announce Type: new Abstract: LLM-based coding agents sometimes acknowledge a problem in their own reasoning and then proceed anyway. We call this pattern strained coherence: a safety-relevant failure mode in which an agent has information that should change…

31
arXiv — Machine Learning research 23d ago

The Easy, the Hard, and the Learnable: Confidence and Difficulty-Adaptive Policy Optimization for LLM Reasoning

arXiv:2606.07950v1 Announce Type: new Abstract: RL with verifiable rewards can substantially improve LLM reasoning, yet standard GRPO-style training often treats easy, hard, and learnable questions alike through uniform sampling and weighting, leading to inefficient compute…

31
arXiv — Machine Learning research 23d ago

Enhancing AI Interpretability and Safety through Localised Architectures

arXiv:2606.07998v1 Announce Type: new Abstract: Recent advances in generative AI, especially powerful Large Language Models (LLMs) and Large Reasoning Models (LRMs), raise concerns over the interpretability, safety and sustainability of these large and opaque AI models. The…

8

Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

Decentralized Multi-Agent Systems with Shared Context

The Role of Feedback Alignment in Self-Distillation

Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution

MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Rotate2Think: Geometric Priming via Orthogonal Rotation to Improve Language Model Reasoning

TD-Grokking: Learning from Zero-Reward Problems by Training-Time Decomposition

SocraticPO: Policy Optimization via Interactive Guidance

IntentKV: Cross-Turn Intent-Aware KV Cache Pruning for Agent Inference

Sample Where You Struggle: Sharpening Base Model Reasoning via Entropy-Guided Power Sampling

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

Dropout-GRPO: Variational Stochasticity for Continuous Latent Reasoning

Beyond Absolute Imitation: Anchored Residual Guidance for Privileged On-Policy Distillation

Using Probabilistic Programs to Train Inductive Reasoning in Large Language Models

The Confident Liar: Diagnosing Multi-Agent Debate with Log-Probabilities and LLM-as-Judge

Early-Token Confidence Predicts Reasoning Quality in Multi-Agent LLM Debate

TabClaw: An Interactive and Self-Evolving Agent for Spreadsheet Manipulation and Table Reasoning

KCSAT-ML: Probing Reasoning Models with Nationwide-Cohort Human Difficulty

WebChallenger: A Reliable and Efficient Generalist Web Agent

REAL: A Reasoning-Enhanced Graph Framework for Long-Term Memory Management of LLMs

Dep-LLM: Training-Free Depression Diagnosis via Evidence-Guided Structured Multi-factor with Reliable LLM Reasoning

Does Reasoning Preserve Alignment? On the Trustworthiness of Large Reasoning Models

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

T1-Bench: Benchmarking Multi-Scenario Agents in Real-World Domains

RealMath-Eval: Why SOTA Judges Struggle with Real Human Reasoning

Decoupling Thought from Speech: Knowledge-Grounded Counterfactual Reasoning for Resilient Multi-Agent Argumentation

How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs

How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs

When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning Models

Fine-tuned Qwen2.5-7B to 96% of Claude Haiku on a domain-specific task using ~$3 of API calls and zero human labelers

Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense

SDR: Set-Distance Rewards for Radiology Report Generation

Introducing Gemma 4 12B: a unified, encoder-free multimodal model

Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning

Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation?

OmniCap-IF: Benchmarking and Improving Instruction Following Abilities for Omni-Video Captioning

Optical Reasoning: Rethinking Images as an Expressive Reasoning Medium Beyond Text

Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short

Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory

Cosine Misleads: Auxiliary Losses Reshape Vision Language Models, Not Their Latents

DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning

Sample-Efficient Post-Training for LEGO Spatial-Physics Reasoning

MetaEvo: A Meta-Optimization Framework for Experience-Driven Agent Evolution

Adversarial Robustness of Activation Steering in Large Language Models

Strained Coherence: A Pre-Failure Signal in Coding Agent Execution Trajectories

The Easy, the Hard, and the Learnable: Confidence and Difficulty-Adaptive Policy Optimization for LLM Reasoning

Enhancing AI Interpretability and Safety through Localised Architectures