News / #reasoning Tag Reasoning 500 articles archived under #reasoning · RSS Sign in to follow arXiv — Machine Learning research 27d ago On Advantage Estimates for Max@K Policy Gradients arXiv:2606.06080v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards is widely used for post-training reasoning models, but sparse outcome rewards make exploration difficult. A complementary approach is to optimize inference-time objectives such as… 19 arXiv — NLP / Computation & Language research 27d ago Multi-Granularity Reasoning for Natural Language Inference arXiv:2606.05181v1 Announce Type: new Abstract: Natural Language Inference (NLI) is a fundamental task in natural language understanding that requires determining the logical relationship between a premise and a hypothesis. Despite the remarkable success of transformer-based… 31 arXiv — NLP / Computation & Language research 27d ago LoRi: Low-Rank Distillation for Implicit Reasoning arXiv:2606.05315v1 Announce Type: new Abstract: Implicit chain-of-thought (iCoT) methods aim to internalize reasoning in large language models, but often underperform explicit CoT prompting. We empirically find that hidden-state reasoning trajectories exhibit low-rank structure.… 36 arXiv — NLP / Computation & Language research 27d ago ReasoningFlow: Discourse Structures for Understanding LLM Reasoning Traces arXiv:2606.05402v1 Announce Type: new Abstract: Large reasoning models (LRMs) produce reasoning traces with non-linear structures, such as backtracking and self-correction, that complicate the evaluation and monitoring of the reasoning process. We introduce ReasoningFlow, a… 30 arXiv — NLP / Computation & Language research 27d ago Beyond tokens: a unified framework for latent communication in LLM-based multi-agent systems arXiv:2606.05711v1 Announce Type: new Abstract: Multi-agent systems built on large language models (LLMs) have become a prevailing paradigm for tackling complex reasoning, planning, and tool-use tasks. The dominant communication protocol in such systems is natural language:… 24 arXiv — NLP / Computation & Language research 27d ago Narrative Knowledge Weaver: Narrative-Centric Retrieval-Augmented Reasoning for Long-Form Text Understanding arXiv:2606.05724v1 Announce Type: new Abstract: Long-form narrative QA requires reasoning over evolving story worlds rather than isolated passages: answers may depend on earlier goals, changing character states, social relations, causal triggers, temporal position, and later… 24 arXiv — NLP / Computation & Language research 27d ago MARDoc: A Memory-Aware Refinement Agent Framework for Multimodal Long Document QA arXiv:2606.05749v1 Announce Type: new Abstract: Iterative retrieval-reasoning agents have recently shown promise for multimodal long-document question answering. However, most existing systems maintain a single growing context that mixes retrieval traces, observations, and… 10 arXiv — NLP / Computation & Language research 27d ago TARPO: Token-Wise Latent-Explicit Reasoning via Action-Routing Policy Optimization arXiv:2606.05859v1 Announce Type: new Abstract: Latent reasoning has emerged as a promising alternative to discrete Chain-of-Thought (CoT) in large language models (LLMs), enabling more expressive reasoning by operating over continuous representations. However, the inherently… 7 arXiv — NLP / Computation & Language research 27d ago IA-RAG: Interval-Algebra-Driven Temporal Reasoning for Dynamic Knowledge Retrieval arXiv:2606.06044v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) has shown strong effectiveness in grounding Large Language Models (LLMs) with external knowledge. However, existing RAG and Graph RAG frameworks largely treat knowledge as static or associate… 13 arXiv — NLP / Computation & Language research 27d ago SkillComposer: Learning to Evolve Agent Skills for Specification and Generalization arXiv:2606.06079v1 Announce Type: new Abstract: Agent skills, which consist of reusable strategies that guide agent reasoning and action, have shown strong potential for improving model capability at inference time. However, current skill construction methods treat the problem… 18 arXiv — NLP / Computation & Language research 27d ago Harnessing Structural Context for Entity Alignment Foundation Models arXiv:2606.06109v1 Announce Type: new Abstract: Entity alignment (EA) aims to identify equivalent entities across heterogeneous knowledge graphs (KGs) and is a key component of knowledge fusion and cross-KG reasoning. The recent EA foundation model demonstrates that alignment… 6 arXiv — NLP / Computation & Language research 27d ago The Tell-Tale Norm: $\ell_2$ Magnitude as a Signal for Reasoning Dynamics in Large Language Models arXiv:2606.06188v1 Announce Type: new Abstract: Recent work has sought to understand Large Language Models (LLMs) reasoning, yet a principled, model-intrinsic signal that captures its layer-wise reasoning dynamics remains underexplored. We bridge this gap by demonstrating that… 38 arXiv — NLP / Computation & Language research 27d ago Latent Reasoning with Normalizing Flows arXiv:2606.06447v1 Announce Type: new Abstract: Large language models often improve reasoning by generating explicit chain-of-thought (CoT), demonstrating the importance of intermediate computation. However, textual CoT forces this computation through a discrete, serial, and… 15 Hugging Face Daily Papers research 27d ago Is This Edit Correct? A Multi-Dimensional Benchmark for Reasoning-Aware Image Editing Abstract RE-Edit benchmark evaluates image editing systems on five reasoning dimensions to assess logical consistency beyond visual plausibility. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Diffusion-based image editing has achieved strong visual fidelity under natural language… 6 Hugging Face Daily Papers research 27d ago The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs Abstract Inference-time scaling is enhanced through constrained optimization that allocates computational resources based on economic principles, improving performance in resource-constrained environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Inference-time scaling has… 9 Hugging Face Daily Papers research 27d ago Latent Reasoning with Normalizing Flows Abstract Latent reasoning framework using normalizing flows preserves autoregressive generation advantages while enabling efficient, probabilistic intermediate computation in large language models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models often improve… 26 Hugging Face Daily Papers research 27d ago Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction Abstract Future-L1, an interleaved latent visual reasoning framework, improves video event prediction by maintaining visual semantics in latent space during autoregressive decoding, achieving state-of-the-art results on FutureBench and TwiFF-Bench benchmarks. Generated by… 20 Hugging Face Daily Papers research 27d ago VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding Abstract VideoKR presents a large-scale video reasoning dataset and benchmark designed to enhance knowledge-intensive video understanding through expert-domain content and human-in-the-loop example generation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We introduce VideoKR,… 24 Hugging Face Daily Papers research 27d ago Unsupervised Skill Discovery for Agentic Data Analysis Abstract DataCOPE is an unsupervised framework that discovers reusable data-analysis skills through verifier-guided exploration, improving analytical performance in both report-style and reasoning-style tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Inference-time skill… 28 r/LocalLLaMA community 28d ago NVIDIA Nemotron 3 Ultra is out. Not sure how much this is in the "local" world but interesting what they are putting out. https://developer.nvidia.com/blog/nvidia-nemotron-3-ultra-powers-faster-more-efficient-reasoning-for-long-running-agents/   submitted by   /u/justdoitanddont [link]   [comments] 33 r/LocalLLaMA community 28d ago KVarN: new KV-cache quant from Huawei. 3–5× KV cache compression with actual speed-up instead of slow-down, and unlike TurboQuant it holds up on reasoning (Apache 2.0, vLLM single flag) The KV-cache quant race just got more interesting. Huawei just open-sourced KVarN , a KV-cache quantization method under Apache 2.0, drops into vLLM with one flag. Posting because the tradeoff it's claiming is genuinely different from what's already in the stack, and I'd like to… 20 Hugging Face Daily Papers research 28d ago DAR: Deontic Reasoning with Agentic Harnesses Abstract Deontic reasoning tasks require applying complex rules and policies, and an agentic approach enables models to dynamically access statutes, showing mixed performance improvements across different model strengths. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Deontic… 7 NVIDIA Developer Blog official-blog 28d ago NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents Single-turn chatbots are evolving into long-running agents that can reason, maintain context, use tools, and run efficiently across many turns to complete... 33 Hugging Face Daily Papers research 28d ago SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes Abstract Vision-language models demonstrate strong performance on isolated spatial reasoning tasks but fail to maintain coherent spatial understanding and reliable actions during multi-turn interactive feedback in 3D environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 15 Hugging Face Daily Papers research 28d ago Economy of Minds: Emerging Multi-Agent Intelligence with Economic Interactions Abstract Decentralized agent economies with auction-based competition and wealth accumulation enable emergent collective intelligence without central coordination, outperforming monolithic approaches in complex reasoning and optimization tasks. Generated by… 27 Vercel — AI dev-tools 28d ago Nemotron 3 Ultra now available on AI Gateway Nemotron 3 Ultra from Nvidia is now available on Vercel AI Gateway . Nemotron 3 Ultra is an open Mixture-of-Experts reasoning model built for orchestrating long-running agent workflows, with a 1M token context window. The model targets multi-turn agent workflows: planning, tool… 37 arXiv — Machine Learning research 28d ago From Symbolic to Geometric: Enabling Spatial Reasoning in Large Language Models arXiv:2606.04381v1 Announce Type: new Abstract: Recent large language models (LLMs) often appear to exhibit spatial reasoning ability; however, this capability is largely \emph{symbolic}, arising from pattern matching over spatial language rather than true \emph{geometric}… 34 arXiv — Machine Learning research 28d ago Smart Picks in the Dark: Towards Efficient RLVR for Reasoning via Tracing Metacognitive Pivots arXiv:2606.04503v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has greatly advanced large reasoning models (LRMs), but it requires timely training on a huge fully-annotated dataset. To this end, data-efficient RLVR methods have been widely… 5 arXiv — Machine Learning research 28d ago GeoMin: Data-Efficient Semi-Supervised RLVR via Geometric Distribution Modeling arXiv:2606.04516v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) significantly advances LLM reasoning, yet it faces a dilemma: standard supervised scaling is throttled by high annotation costs, while unsupervised alternatives suffer from… 15 arXiv — Machine Learning research 28d ago Rollout-Level Advantage-Prioritized Experience Replay for GRPO arXiv:2606.04560v1 Announce Type: new Abstract: Reinforcement learning from verifiable rewards with GRPO is a standard approach for post-training reasoning LLMs. It remains sample inefficient. Each rollout is used for a single gradient update and then discarded. Naive replay is… 38 arXiv — NLP / Computation & Language research 28d ago SaliMory: Orchestrating Cognitive Memory for Conversational Agents arXiv:2606.04120v1 Announce Type: new Abstract: Conversational agents that serve as lifelong companions must maintain persistent memory across all interactions. However, simply expanding context windows with raw retrieval degrades reasoning quality, while training memory agents… 10 arXiv — NLP / Computation & Language research 28d ago Deliberate Evolution: Agentic Reasoning for Sample-Efficient Symbolic Regression with LLMs arXiv:2606.04360v1 Announce Type: new Abstract: Symbolic regression (SR) discovers compact mathematical expressions from data, yet recent LLM-based evolutionary methods remain sample-inefficient because they rely mainly on scalar feedback such as MSE. We identify a core… 37 arXiv — NLP / Computation & Language research 28d ago MemoryDocDataSet: A Benchmark for Joint Conversational Memory and Long Document Reasoning arXiv:2606.04442v1 Announce Type: new Abstract: AI systems increasingly need to combine two demanding capabilities: navigating multi-session conversation history and performing deep reading comprehension within long documents. Yet no existing benchmark evaluates both… 16 arXiv — NLP / Computation & Language research 28d ago Stepwise Reasoning Enhancement for LLMs via External Subgraph Generation arXiv:2606.04454v1 Announce Type: new Abstract: Large language models have shown strong performance in natural language generation and downstream reasoning tasks, but they still struggle with logical consistency, factual grounding, and interpretability in complex multi-step… 15 arXiv — NLP / Computation & Language research 28d ago Learning What to Learn: Stage-Specific Data Sets for SFT-then-RL in Small Language Model Reasoning arXiv:2606.04466v1 Announce Type: new Abstract: Post-training Small Language Models (SLMs) for reasoning typically follows an SFT-then-RL pipeline, yet existing work rarely considers what data should be learned at each stage. We argue that data strategy should be aligned with… 24 arXiv — NLP / Computation & Language research 28d ago Entity Binding Failures in Speech LLM Reasoning: Diagnosis and Chain-of-Thought Intervention arXiv:2606.04474v1 Announce Type: new Abstract: Speech Large Language Models (SLLMs) underperform their text counterparts on complex reasoning. We reveal that this modality gap is not a uniform cognitive deficit. Evaluating three diverse SLLMs, we show speech-to-text (S2T)… 37 arXiv — NLP / Computation & Language research 28d ago Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models arXiv:2606.04535v1 Announce Type: new Abstract: Diffusion large language models (dLLMs) offer bidirectional attention and parallel generation, enabling them to exploit global context and naturally support format-constrained tasks like parseable JSON or reasoning templates. While… 16 arXiv — NLP / Computation & Language research 28d ago GRAIL: Gradient-Reweighted Advantages for Reinforcement Learning with Verifiable Rewards arXiv:2606.04889v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (e.g. GRPO) is now a common way to improve mathematical reasoning in Large Language Models (LLMs). However, current methods usually broadcast one sequence-level advantage to all… 8 arXiv — NLP / Computation & Language research 28d ago Caliper: Probing Lexical Anchors versus Causal Structure in LLMs arXiv:2606.04915v1 Announce Type: new Abstract: Large language models reach 50 to 70% accuracy on causal reasoning benchmarks such as CLadder, but it is unclear whether this reflects structural reasoning or lexical pattern matching. We introduce Caliper, a controlled… 18 arXiv — NLP / Computation & Language research 28d ago DeliChess: A Multi-party Dialogue Dataset for Deliberation in Chess Puzzle Solving arXiv:2606.04987v1 Announce Type: new Abstract: Multi-party dialogue is a critical setting for studying collaborative reasoning and decision-making, yet existing datasets rarely focus on structured, in-depth complex reasoning tasks. We introduce DeliChess, a novel dataset of… 35 arXiv — NLP / Computation & Language research 28d ago DAR: Deontic Reasoning with Agentic Harnesses arXiv:2606.05009v1 Announce Type: new Abstract: Deontic reasoning is the task of answering questions by applying explicit rules and policies to case-specific facts, for example computing tax liability under a statute or determining the outcome of an immigration appeal. A key… 22 arXiv — NLP / Computation & Language research 28d ago Imbuing Large Language Models with Bidirectional Logic for Robust Chain Repair arXiv:2606.05030v1 Announce Type: new Abstract: Autoregressive chain-of-thought (CoT) reasoning in large language models (LLMs) is fundamentally forward-directed: each step conditions only on prior tokens. This unidirectional inductive bias renders even capable models… 31 arXiv — NLP / Computation & Language research 28d ago Boosting Self-Consistency with Ranking arXiv:2606.05054v1 Announce Type: new Abstract: Self-consistency improves large language models by sampling multiple reasoning paths and selecting the most frequent answer, but majority voting often fails to recover correct answers that are already present among the samples. We… 33 arXiv — NLP / Computation & Language research 28d ago Arithmetic Pedagogy for Language Models arXiv:2606.05106v1 Announce Type: new Abstract: We investigate whether methods of human mathematics pedagogy can guide the training of language models toward arithmetic reasoning. Building on the GASING method -- an Indonesian pedagogy that solves basic arithmetic through a… 32 arXiv — NLP / Computation & Language research 28d ago Streaming Communication in Multi-Agent Reasoning arXiv:2606.05158v1 Announce Type: new Abstract: Multi-agent reasoning systems adopt a "generate-then-transfer" paradigm that forces end-to-end latency to scale linearly with pipeline depth. We introduce StreamMA, a multi-agent reasoning system that streams each reasoning step to… 8 arXiv — NLP / Computation & Language research 28d ago VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark arXiv:2606.04244v1 Announce Type: cross Abstract: Multimodal large language models are increasingly capable of complex reasoning, yet their performance often degrades when they must externalize a problem through a tool and then reason over the tool's output, specifically when… 7 arXiv — NLP / Computation & Language research 28d ago StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis arXiv:2606.04246v1 Announce Type: cross Abstract: Automatic generation of RTL code for digital hardware designs remains challenging due to long-horizon reasoning, multi-step dependencies, and strict correctness constraints in Verilog and VHDL. We present StepPRM-RTL, a novel… 8 arXiv — NLP / Computation & Language research 28d ago Cascading Hallucination in Agentic RAG: The CHARM Framework for Detection and Mitigation arXiv:2606.04435v1 Announce Type: cross Abstract: Multi-step agentic retrieval-augmented generation (RAG) pipelines have demonstrated significant capability for complex reasoning tasks, yet remain vulnerable to a class of failure that existing hallucination detection mechanisms… 25 r/MachineLearning community 28d ago Best Visual Reasoning Model in 2026 (Including APIs) [D] For example, suppose I have a one-hour video and I provide it to ChatGPT or another AI model. If I ask complex reasoning questions about the video, which models are best suited for long-horizon video understanding and reasoning? Which models can produce the most reliable answers… 38 Hugging Face Daily Papers research 28d ago ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning Abstract ThoughtFold addresses over-thinking in large reasoning models by using fine-grained preference learning to identify and eliminate redundant explorations in chain-of-thought reasoning processes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large Reasoning Models (LRMs)… 13 Page 10 of 10 · 500 articles ← Newer