Tag

Reasoning

500 articles archived under #reasoning · RSS

arXiv — Machine Learning research 27d ago

On Advantage Estimates for Max@K Policy Gradients

arXiv:2606.06080v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards is widely used for post-training reasoning models, but sparse outcome rewards make exploration difficult. A complementary approach is to optimize inference-time objectives such as…

19
arXiv — NLP / Computation & Language research 27d ago

Multi-Granularity Reasoning for Natural Language Inference

arXiv:2606.05181v1 Announce Type: new Abstract: Natural Language Inference (NLI) is a fundamental task in natural language understanding that requires determining the logical relationship between a premise and a hypothesis. Despite the remarkable success of transformer-based…

31
arXiv — NLP / Computation & Language research 27d ago

LoRi: Low-Rank Distillation for Implicit Reasoning

arXiv:2606.05315v1 Announce Type: new Abstract: Implicit chain-of-thought (iCoT) methods aim to internalize reasoning in large language models, but often underperform explicit CoT prompting. We empirically find that hidden-state reasoning trajectories exhibit low-rank structure.…

36
arXiv — NLP / Computation & Language research 27d ago

ReasoningFlow: Discourse Structures for Understanding LLM Reasoning Traces

arXiv:2606.05402v1 Announce Type: new Abstract: Large reasoning models (LRMs) produce reasoning traces with non-linear structures, such as backtracking and self-correction, that complicate the evaluation and monitoring of the reasoning process. We introduce ReasoningFlow, a…

30
arXiv — NLP / Computation & Language research 27d ago

Beyond tokens: a unified framework for latent communication in LLM-based multi-agent systems

arXiv:2606.05711v1 Announce Type: new Abstract: Multi-agent systems built on large language models (LLMs) have become a prevailing paradigm for tackling complex reasoning, planning, and tool-use tasks. The dominant communication protocol in such systems is natural language:…

24
arXiv — NLP / Computation & Language research 27d ago

Narrative Knowledge Weaver: Narrative-Centric Retrieval-Augmented Reasoning for Long-Form Text Understanding

arXiv:2606.05724v1 Announce Type: new Abstract: Long-form narrative QA requires reasoning over evolving story worlds rather than isolated passages: answers may depend on earlier goals, changing character states, social relations, causal triggers, temporal position, and later…

24
arXiv — NLP / Computation & Language research 27d ago

MARDoc: A Memory-Aware Refinement Agent Framework for Multimodal Long Document QA

arXiv:2606.05749v1 Announce Type: new Abstract: Iterative retrieval-reasoning agents have recently shown promise for multimodal long-document question answering. However, most existing systems maintain a single growing context that mixes retrieval traces, observations, and…

10
arXiv — NLP / Computation & Language research 27d ago

TARPO: Token-Wise Latent-Explicit Reasoning via Action-Routing Policy Optimization

arXiv:2606.05859v1 Announce Type: new Abstract: Latent reasoning has emerged as a promising alternative to discrete Chain-of-Thought (CoT) in large language models (LLMs), enabling more expressive reasoning by operating over continuous representations. However, the inherently…

7
arXiv — NLP / Computation & Language research 27d ago

IA-RAG: Interval-Algebra-Driven Temporal Reasoning for Dynamic Knowledge Retrieval

arXiv:2606.06044v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) has shown strong effectiveness in grounding Large Language Models (LLMs) with external knowledge. However, existing RAG and Graph RAG frameworks largely treat knowledge as static or associate…

13
arXiv — NLP / Computation & Language research 27d ago

SkillComposer: Learning to Evolve Agent Skills for Specification and Generalization

arXiv:2606.06079v1 Announce Type: new Abstract: Agent skills, which consist of reusable strategies that guide agent reasoning and action, have shown strong potential for improving model capability at inference time. However, current skill construction methods treat the problem…

18
arXiv — NLP / Computation & Language research 27d ago

Harnessing Structural Context for Entity Alignment Foundation Models

arXiv:2606.06109v1 Announce Type: new Abstract: Entity alignment (EA) aims to identify equivalent entities across heterogeneous knowledge graphs (KGs) and is a key component of knowledge fusion and cross-KG reasoning. The recent EA foundation model demonstrates that alignment…

6
arXiv — NLP / Computation & Language research 27d ago

The Tell-Tale Norm: $\ell_2$ Magnitude as a Signal for Reasoning Dynamics in Large Language Models

arXiv:2606.06188v1 Announce Type: new Abstract: Recent work has sought to understand Large Language Models (LLMs) reasoning, yet a principled, model-intrinsic signal that captures its layer-wise reasoning dynamics remains underexplored. We bridge this gap by demonstrating that…

38
arXiv — NLP / Computation & Language research 27d ago

Latent Reasoning with Normalizing Flows

arXiv:2606.06447v1 Announce Type: new Abstract: Large language models often improve reasoning by generating explicit chain-of-thought (CoT), demonstrating the importance of intermediate computation. However, textual CoT forces this computation through a discrete, serial, and…

15
Hugging Face Daily Papers research 27d ago

Is This Edit Correct? A Multi-Dimensional Benchmark for Reasoning-Aware Image Editing

Abstract RE-Edit benchmark evaluates image editing systems on five reasoning dimensions to assess logical consistency beyond visual plausibility. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Diffusion-based image editing has achieved strong visual fidelity under natural language…

6
Hugging Face Daily Papers research 27d ago

The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs

Abstract Inference-time scaling is enhanced through constrained optimization that allocates computational resources based on economic principles, improving performance in resource-constrained environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Inference-time scaling has…

9
Hugging Face Daily Papers research 27d ago

Latent Reasoning with Normalizing Flows

Abstract Latent reasoning framework using normalizing flows preserves autoregressive generation advantages while enabling efficient, probabilistic intermediate computation in large language models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models often improve…

26
Hugging Face Daily Papers research 27d ago

Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction

Abstract Future-L1, an interleaved latent visual reasoning framework, improves video event prediction by maintaining visual semantics in latent space during autoregressive decoding, achieving state-of-the-art results on FutureBench and TwiFF-Bench benchmarks. Generated by…

20
Hugging Face Daily Papers research 27d ago

VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding

Abstract VideoKR presents a large-scale video reasoning dataset and benchmark designed to enhance knowledge-intensive video understanding through expert-domain content and human-in-the-loop example generation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We introduce VideoKR,…

24
Hugging Face Daily Papers research 27d ago

Unsupervised Skill Discovery for Agentic Data Analysis

Abstract DataCOPE is an unsupervised framework that discovers reusable data-analysis skills through verifier-guided exploration, improving analytical performance in both report-style and reasoning-style tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Inference-time skill…

28
r/LocalLLaMA community 28d ago

NVIDIA Nemotron 3 Ultra is out.

Not sure how much this is in the "local" world but interesting what they are putting out. https://developer.nvidia.com/blog/nvidia-nemotron-3-ultra-powers-faster-more-efficient-reasoning-for-long-running-agents/   submitted by   /u/justdoitanddont [link]   [comments]

33
r/LocalLLaMA community 28d ago

KVarN: new KV-cache quant from Huawei. 3–5× KV cache compression with actual speed-up instead of slow-down, and unlike TurboQuant it holds up on reasoning (Apache 2.0, vLLM single flag)

The KV-cache quant race just got more interesting. Huawei just open-sourced KVarN , a KV-cache quantization method under Apache 2.0, drops into vLLM with one flag. Posting because the tradeoff it's claiming is genuinely different from what's already in the stack, and I'd like to…

20
Hugging Face Daily Papers research 28d ago

DAR: Deontic Reasoning with Agentic Harnesses

Abstract Deontic reasoning tasks require applying complex rules and policies, and an agentic approach enables models to dynamically access statutes, showing mixed performance improvements across different model strengths. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Deontic…

7
NVIDIA Developer Blog official-blog 28d ago

NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents

Single-turn chatbots are evolving into long-running agents that can reason, maintain context, use tools, and run efficiently across many turns to complete...

33
Hugging Face Daily Papers research 28d ago

SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes

Abstract Vision-language models demonstrate strong performance on isolated spatial reasoning tasks but fail to maintain coherent spatial understanding and reliable actions during multi-turn interactive feedback in 3D environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

15
Hugging Face Daily Papers research 28d ago

Economy of Minds: Emerging Multi-Agent Intelligence with Economic Interactions

Abstract Decentralized agent economies with auction-based competition and wealth accumulation enable emergent collective intelligence without central coordination, outperforming monolithic approaches in complex reasoning and optimization tasks. Generated by…

27
Vercel — AI dev-tools 28d ago

Nemotron 3 Ultra now available on AI Gateway

Nemotron 3 Ultra from Nvidia is now available on Vercel AI Gateway . Nemotron 3 Ultra is an open Mixture-of-Experts reasoning model built for orchestrating long-running agent workflows, with a 1M token context window. The model targets multi-turn agent workflows: planning, tool…

37
arXiv — Machine Learning research 28d ago

From Symbolic to Geometric: Enabling Spatial Reasoning in Large Language Models

arXiv:2606.04381v1 Announce Type: new Abstract: Recent large language models (LLMs) often appear to exhibit spatial reasoning ability; however, this capability is largely \emph{symbolic}, arising from pattern matching over spatial language rather than true \emph{geometric}…

34
arXiv — Machine Learning research 28d ago

Smart Picks in the Dark: Towards Efficient RLVR for Reasoning via Tracing Metacognitive Pivots

arXiv:2606.04503v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has greatly advanced large reasoning models (LRMs), but it requires timely training on a huge fully-annotated dataset. To this end, data-efficient RLVR methods have been widely…

5
arXiv — Machine Learning research 28d ago

GeoMin: Data-Efficient Semi-Supervised RLVR via Geometric Distribution Modeling

arXiv:2606.04516v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) significantly advances LLM reasoning, yet it faces a dilemma: standard supervised scaling is throttled by high annotation costs, while unsupervised alternatives suffer from…

15
arXiv — Machine Learning research 28d ago

Rollout-Level Advantage-Prioritized Experience Replay for GRPO

arXiv:2606.04560v1 Announce Type: new Abstract: Reinforcement learning from verifiable rewards with GRPO is a standard approach for post-training reasoning LLMs. It remains sample inefficient. Each rollout is used for a single gradient update and then discarded. Naive replay is…

38
arXiv — NLP / Computation & Language research 28d ago

SaliMory: Orchestrating Cognitive Memory for Conversational Agents

arXiv:2606.04120v1 Announce Type: new Abstract: Conversational agents that serve as lifelong companions must maintain persistent memory across all interactions. However, simply expanding context windows with raw retrieval degrades reasoning quality, while training memory agents…

10
arXiv — NLP / Computation & Language research 28d ago

Deliberate Evolution: Agentic Reasoning for Sample-Efficient Symbolic Regression with LLMs

arXiv:2606.04360v1 Announce Type: new Abstract: Symbolic regression (SR) discovers compact mathematical expressions from data, yet recent LLM-based evolutionary methods remain sample-inefficient because they rely mainly on scalar feedback such as MSE. We identify a core…

37
arXiv — NLP / Computation & Language research 28d ago

MemoryDocDataSet: A Benchmark for Joint Conversational Memory and Long Document Reasoning

arXiv:2606.04442v1 Announce Type: new Abstract: AI systems increasingly need to combine two demanding capabilities: navigating multi-session conversation history and performing deep reading comprehension within long documents. Yet no existing benchmark evaluates both…

16
arXiv — NLP / Computation & Language research 28d ago

Stepwise Reasoning Enhancement for LLMs via External Subgraph Generation

arXiv:2606.04454v1 Announce Type: new Abstract: Large language models have shown strong performance in natural language generation and downstream reasoning tasks, but they still struggle with logical consistency, factual grounding, and interpretability in complex multi-step…

15
arXiv — NLP / Computation & Language research 28d ago

Learning What to Learn: Stage-Specific Data Sets for SFT-then-RL in Small Language Model Reasoning

arXiv:2606.04466v1 Announce Type: new Abstract: Post-training Small Language Models (SLMs) for reasoning typically follows an SFT-then-RL pipeline, yet existing work rarely considers what data should be learned at each stage. We argue that data strategy should be aligned with…

24
arXiv — NLP / Computation & Language research 28d ago

Entity Binding Failures in Speech LLM Reasoning: Diagnosis and Chain-of-Thought Intervention

arXiv:2606.04474v1 Announce Type: new Abstract: Speech Large Language Models (SLLMs) underperform their text counterparts on complex reasoning. We reveal that this modality gap is not a uniform cognitive deficit. Evaluating three diverse SLLMs, we show speech-to-text (S2T)…

37
arXiv — NLP / Computation & Language research 28d ago

Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models

arXiv:2606.04535v1 Announce Type: new Abstract: Diffusion large language models (dLLMs) offer bidirectional attention and parallel generation, enabling them to exploit global context and naturally support format-constrained tasks like parseable JSON or reasoning templates. While…

16
arXiv — NLP / Computation & Language research 28d ago

GRAIL: Gradient-Reweighted Advantages for Reinforcement Learning with Verifiable Rewards

arXiv:2606.04889v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (e.g. GRPO) is now a common way to improve mathematical reasoning in Large Language Models (LLMs). However, current methods usually broadcast one sequence-level advantage to all…

8
arXiv — NLP / Computation & Language research 28d ago

Caliper: Probing Lexical Anchors versus Causal Structure in LLMs

arXiv:2606.04915v1 Announce Type: new Abstract: Large language models reach 50 to 70% accuracy on causal reasoning benchmarks such as CLadder, but it is unclear whether this reflects structural reasoning or lexical pattern matching. We introduce Caliper, a controlled…

18
arXiv — NLP / Computation & Language research 28d ago

DeliChess: A Multi-party Dialogue Dataset for Deliberation in Chess Puzzle Solving

arXiv:2606.04987v1 Announce Type: new Abstract: Multi-party dialogue is a critical setting for studying collaborative reasoning and decision-making, yet existing datasets rarely focus on structured, in-depth complex reasoning tasks. We introduce DeliChess, a novel dataset of…

35
arXiv — NLP / Computation & Language research 28d ago

DAR: Deontic Reasoning with Agentic Harnesses

arXiv:2606.05009v1 Announce Type: new Abstract: Deontic reasoning is the task of answering questions by applying explicit rules and policies to case-specific facts, for example computing tax liability under a statute or determining the outcome of an immigration appeal. A key…

22
arXiv — NLP / Computation & Language research 28d ago

Imbuing Large Language Models with Bidirectional Logic for Robust Chain Repair

arXiv:2606.05030v1 Announce Type: new Abstract: Autoregressive chain-of-thought (CoT) reasoning in large language models (LLMs) is fundamentally forward-directed: each step conditions only on prior tokens. This unidirectional inductive bias renders even capable models…

31
arXiv — NLP / Computation & Language research 28d ago

Boosting Self-Consistency with Ranking

arXiv:2606.05054v1 Announce Type: new Abstract: Self-consistency improves large language models by sampling multiple reasoning paths and selecting the most frequent answer, but majority voting often fails to recover correct answers that are already present among the samples. We…

33
arXiv — NLP / Computation & Language research 28d ago

Arithmetic Pedagogy for Language Models

arXiv:2606.05106v1 Announce Type: new Abstract: We investigate whether methods of human mathematics pedagogy can guide the training of language models toward arithmetic reasoning. Building on the GASING method -- an Indonesian pedagogy that solves basic arithmetic through a…

32
arXiv — NLP / Computation & Language research 28d ago

Streaming Communication in Multi-Agent Reasoning

arXiv:2606.05158v1 Announce Type: new Abstract: Multi-agent reasoning systems adopt a "generate-then-transfer" paradigm that forces end-to-end latency to scale linearly with pipeline depth. We introduce StreamMA, a multi-agent reasoning system that streams each reasoning step to…

8
arXiv — NLP / Computation & Language research 28d ago

VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark

arXiv:2606.04244v1 Announce Type: cross Abstract: Multimodal large language models are increasingly capable of complex reasoning, yet their performance often degrades when they must externalize a problem through a tool and then reason over the tool's output, specifically when…

7
arXiv — NLP / Computation & Language research 28d ago

StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis

arXiv:2606.04246v1 Announce Type: cross Abstract: Automatic generation of RTL code for digital hardware designs remains challenging due to long-horizon reasoning, multi-step dependencies, and strict correctness constraints in Verilog and VHDL. We present StepPRM-RTL, a novel…

8
arXiv — NLP / Computation & Language research 28d ago

Cascading Hallucination in Agentic RAG: The CHARM Framework for Detection and Mitigation

arXiv:2606.04435v1 Announce Type: cross Abstract: Multi-step agentic retrieval-augmented generation (RAG) pipelines have demonstrated significant capability for complex reasoning tasks, yet remain vulnerable to a class of failure that existing hallucination detection mechanisms…

25
r/MachineLearning community 28d ago

Best Visual Reasoning Model in 2026 (Including APIs) [D]

For example, suppose I have a one-hour video and I provide it to ChatGPT or another AI model. If I ask complex reasoning questions about the video, which models are best suited for long-horizon video understanding and reasoning? Which models can produce the most reliable answers…

38
Hugging Face Daily Papers research 28d ago

ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning

Abstract ThoughtFold addresses over-thinking in large reasoning models by using fine-grained preference learning to identify and eliminate redundant explorations in chain-of-thought reasoning processes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large Reasoning Models (LRMs)…

13

On Advantage Estimates for Max@K Policy Gradients

Multi-Granularity Reasoning for Natural Language Inference

LoRi: Low-Rank Distillation for Implicit Reasoning

ReasoningFlow: Discourse Structures for Understanding LLM Reasoning Traces

Beyond tokens: a unified framework for latent communication in LLM-based multi-agent systems

Narrative Knowledge Weaver: Narrative-Centric Retrieval-Augmented Reasoning for Long-Form Text Understanding

MARDoc: A Memory-Aware Refinement Agent Framework for Multimodal Long Document QA

TARPO: Token-Wise Latent-Explicit Reasoning via Action-Routing Policy Optimization

IA-RAG: Interval-Algebra-Driven Temporal Reasoning for Dynamic Knowledge Retrieval

SkillComposer: Learning to Evolve Agent Skills for Specification and Generalization

Harnessing Structural Context for Entity Alignment Foundation Models

The Tell-Tale Norm: $\ell_2$ Magnitude as a Signal for Reasoning Dynamics in Large Language Models

Latent Reasoning with Normalizing Flows

Is This Edit Correct? A Multi-Dimensional Benchmark for Reasoning-Aware Image Editing

The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs

Latent Reasoning with Normalizing Flows

Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction

VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding

Unsupervised Skill Discovery for Agentic Data Analysis

NVIDIA Nemotron 3 Ultra is out.

KVarN: new KV-cache quant from Huawei. 3–5× KV cache compression with actual speed-up instead of slow-down, and unlike TurboQuant it holds up on reasoning (Apache 2.0, vLLM single flag)

DAR: Deontic Reasoning with Agentic Harnesses

NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents

SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes

Economy of Minds: Emerging Multi-Agent Intelligence with Economic Interactions

Nemotron 3 Ultra now available on AI Gateway

From Symbolic to Geometric: Enabling Spatial Reasoning in Large Language Models

Smart Picks in the Dark: Towards Efficient RLVR for Reasoning via Tracing Metacognitive Pivots

GeoMin: Data-Efficient Semi-Supervised RLVR via Geometric Distribution Modeling

Rollout-Level Advantage-Prioritized Experience Replay for GRPO

SaliMory: Orchestrating Cognitive Memory for Conversational Agents

Deliberate Evolution: Agentic Reasoning for Sample-Efficient Symbolic Regression with LLMs

MemoryDocDataSet: A Benchmark for Joint Conversational Memory and Long Document Reasoning

Stepwise Reasoning Enhancement for LLMs via External Subgraph Generation

Learning What to Learn: Stage-Specific Data Sets for SFT-then-RL in Small Language Model Reasoning

Entity Binding Failures in Speech LLM Reasoning: Diagnosis and Chain-of-Thought Intervention

Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models

GRAIL: Gradient-Reweighted Advantages for Reinforcement Learning with Verifiable Rewards

Caliper: Probing Lexical Anchors versus Causal Structure in LLMs

DeliChess: A Multi-party Dialogue Dataset for Deliberation in Chess Puzzle Solving

DAR: Deontic Reasoning with Agentic Harnesses

Imbuing Large Language Models with Bidirectional Logic for Robust Chain Repair

Boosting Self-Consistency with Ranking

Arithmetic Pedagogy for Language Models

Streaming Communication in Multi-Agent Reasoning

VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark

StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis

Cascading Hallucination in Agentic RAG: The CHARM Framework for Detection and Mitigation

Best Visual Reasoning Model in 2026 (Including APIs) [D]

ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning