News / #reasoning Tag Reasoning 500 articles archived under #reasoning · RSS Sign in to follow arXiv — Machine Learning research 23d ago ConSteer-RL: Steering Reasoning Capabilities in Large Language Models via Confidence-Aware Reinforcement Learning arXiv:2606.08088v1 Announce Type: new Abstract: Reinforcement Learning from Verifiable Rewards (RLVR) has recently become a key paradigm for improving the reasoning abilities of Large Language Models (LLMs), yet it remains limited by sparse binary rewards and its ignorance of… 28 Hugging Face Daily Papers research 23d ago SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks Abstract SpatialWorld presents a unified benchmark for evaluating interactive spatial understanding in multimodal agents through diverse real-world tasks with partial observability and text-based actions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Spatial reasoning is a… 7 Hugging Face Daily Papers research 24d ago Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models Abstract Imaginative Perception Tokens (IPT) enhance vision-language models' spatial reasoning by providing intermediate perceptual representations that externalize what the model would perceive from alternative viewpoints, outperforming traditional text-based reasoning methods.… 22 Hugging Face Daily Papers research 24d ago CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning Abstract Contrastive Reflection (CORE) improves language model reasoning by analyzing differences between successful and unsuccessful attempts to generate concise, interpretable insights that enable faster and more efficient self-improvement compared to traditional parametric… 21 r/LocalLLaMA community 24d ago Nex N2 has a funny "few words do trick" reasoning I've been playing with Nex N2 Pro (Qwen 3.5 397B finetune) locally today. I noticed straight away that it has a pattern of reasoning that is distinct and uses simple words like "need" and "maybe" a lot. Here's a sample of reasoning. We need answer user asks "what is the theory… 16 Hugging Face Daily Papers research 24d ago Reinforcement Learning from Rich Feedback with Distributional DAgger Abstract Forward cross-entropy objective with distributional imitation learning enables monotonic policy improvement and better performance in reasoning tasks compared to traditional reinforcement learning methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reasoning models… 15 Hugging Face Daily Papers research 24d ago Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation Abstract Interactive ASR framework integrates semantic correction and reasoning-based editing to reduce semantic errors through multi-turn refinement, validated by a new sentence-level semantic error rate metric and interactive simulation system. Generated by… 35 Hugging Face Daily Papers research 24d ago Compress-Distill: Reasoning Trace Compression for Efficient Knowledge Distillation Abstract Post-hoc compression of reasoning traces reduces computational costs and inference lengths while maintaining high accuracy, offering an accuracy-efficiency trade-off in knowledge distillation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reasoning models produce long… 24 Hugging Face Daily Papers research 24d ago Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback Abstract Critic-R framework enhances agentic search by closing the feedback loop between reasoning agents and retrieval models through critic evaluation and dual optimization mechanisms. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Agentic search systems iteratively interact… 34 arXiv — Machine Learning research 24d ago TALAN: Task-Aligned Latent Adaptation Networks for Targeted Post-Training of Large Language Models arXiv:2606.06902v1 Announce Type: new Abstract: Targeted post-training aims to improve reasoning, math, and code without degrading strengths. Low-rank adapters are efficient but task-global; activation interventions are input-aware but often require separate probes, vectors, or… 21 arXiv — Machine Learning research 24d ago The Fine-Tuning Trap: Evaluating Negative Transfer and the Role of PEFT in Sub-1B Mathematical Reasoning arXiv:2606.06920v1 Announce Type: new Abstract: Deploying Small Language Models (SLMs) on edge devices requires efficient fine-tuning strategies that adapt models to new tasks without degrading their general capabilities. In this study, we benchmark five sub-1B models (135M-1B)… 17 arXiv — NLP / Computation & Language research 24d ago RASFT: Rollout-Adaptive Supervised Fine-Tuning for Reasoning arXiv:2606.07006v1 Announce Type: cross Abstract: Supervised fine-tuning (SFT) is a prevailing method for adapting large language models to reasoning tasks by imitating offline expert demonstrations, often treating a single expert trajectory as the target behavior. However,… 15 arXiv — Machine Learning research 24d ago On the Geometry of On-Policy Distillation arXiv:2606.07082v1 Announce Type: new Abstract: On-policy distillation (OPD) is increasingly used to improve large language model reasoning, but its training dynamics remain poorly understood. We characterize the trajectory of OPD updates in parameter space and compare it with… 10 arXiv — Machine Learning research 24d ago A Comprehensive Anatomy of Human and DeepSeek-R1 LLM Mathematical Reasoning arXiv:2606.07410v1 Announce Type: new Abstract: The emergence of "Aha moments" in large language models, particularly DeepSeek-R1-0120, has raised the question of whether these systems genuinely reason or merely imitate the appearance of reasoning. We conduct a comprehensive… 18 arXiv — NLP / Computation & Language research 24d ago How Language Models Fail: Token-Level Signatures of Committed and Persistent Reasoning Failures arXiv:2606.06635v1 Announce Type: new Abstract: Failures in language model reasoning emerge through distinct processes that leave identifiable signatures in the reasoning trace. We characterize these failures using token-level uncertainty signals, finding they arise through two… 24 arXiv — NLP / Computation & Language research 24d ago CAF-Gen: A Multi-Agent System for Enriching Argumentation Structures arXiv:2606.06646v1 Announce Type: new Abstract: Formalizing complex reasoning from natural text is one of the central challenges in computational linguistics. It requires systems to understand not just keywords but also the context and complex reasoning embedded in a text.… 10 arXiv — NLP / Computation & Language research 24d ago Signal-Driven Observation for Long-Horizon Web Agents arXiv:2606.06708v1 Announce Type: new Abstract: Web agents operating over long horizons ingest raw DOM and accessibility trees -- routinely tens of thousands of tokens -- at every action step, causing progressive context degradation that erodes reasoning well before tasks… 7 arXiv — NLP / Computation & Language research 24d ago When to Think Deeply: Inhibitory Deliberation for LLM Reasoning arXiv:2606.06745v1 Announce Type: new Abstract: Reasoning Large Language Models can improve problem-solving performance through deliberative inference, but invoking slow reasoning for every input is computationally expensive and often unnecessary. We propose IDPR, a framework… 25 arXiv — NLP / Computation & Language research 24d ago Characterize Then Distill: Mechanistic Reasoning in Large Output Spaces arXiv:2606.06840v1 Announce Type: new Abstract: Modern reasoning models offer surprisingly strong zero-shot performance on challenging multi-label tasks that require selecting a small set of relevant options from hundreds of thousands to millions of candidate labels. We… 30 arXiv — NLP / Computation & Language research 24d ago CRAFT: A Unified Counterfactual Reasoning Framework for Tabular Question Answering and Fact Verification arXiv:2606.06842v1 Announce Type: new Abstract: Table reasoning remains challenging for large language models (LLMs), particularly in tasks that require multi-step inference over long and structured tables. Existing approaches predominantly rely on single-direction reasoning,… 34 arXiv — NLP / Computation & Language research 24d ago Are Large Language Models Suitable for Graph Computation? Progress and Prospects arXiv:2606.06865v1 Announce Type: new Abstract: Large language models (LLMs) have been increasingly explored for graph computation, where tasks require reasoning over structured relationships and algorithmic operations. Yet, it remains unclear when LLMs can reliably support such… 28 arXiv — NLP / Computation & Language research 24d ago ThinkBooster: A Unified Framework for Seamless Test-Time Scaling of LLM Reasoning arXiv:2606.06915v1 Announce Type: new Abstract: Test-time compute (TTC) scaling has emerged as a powerful paradigm for improving large language model (LLM) reasoning by allocating additional compute during inference, e.g., via multi-sample generation and verifier-based… 34 arXiv — NLP / Computation & Language research 24d ago TRACE: Trajectory Reasoning through Adaptive Cross-Step Evidence Aggregation for LLM Agents arXiv:2606.07054v1 Announce Type: new Abstract: Autonomous LLM agents can pursue hidden malicious objectives through sequences of individually benign actions, making sabotage difficult to detect using standard trajectory-level monitoring. Existing approaches either evaluate… 22 arXiv — NLP / Computation & Language research 24d ago mmPISA-bench: Do LLMs Reason Equally Well Across 43 Languages? arXiv:2606.07069v1 Announce Type: new Abstract: We introduce mmPISA-bench, a compact high-quality multilingual reasoning benchmark derived from the OECD Programme for International Student Assessment (PISA). The benchmark consists of 25 multiple-choice questions that require… 19 arXiv — NLP / Computation & Language research 24d ago From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning arXiv:2606.07190v1 Announce Type: new Abstract: Reasoning prefixes shape the future trajectory of LLM problem solving, yet existing process reward models usually evaluate them through local step correctness. We argue that correctness is a useful but indirect proxy for the effect… 21 arXiv — NLP / Computation & Language research 24d ago M$^3$Exam: Benchmarking Multimodal Memory for Realistic User-Agent Interactions arXiv:2606.07402v1 Announce Type: new Abstract: Language agents are increasingly deployed over accumulating multimodal information, yet existing benchmarks assume a human-human form with sparse visuals and straightforward content, evaluating neither reasoning over authentic… 19 arXiv — NLP / Computation & Language research 24d ago How reliable are LLMs when it comes to playing dice? arXiv:2606.07515v1 Announce Type: new Abstract: We investigate the probabilistic reasoning capabilities of large language models through a controlled benchmarking study on discrete probability problems. We constructed two datasets, respectively a set of standard exercises and a… 33 arXiv — NLP / Computation & Language research 24d ago MADRAG: Multi-Agent Debate with Retrieval-Augmented Generation for Training-Free Analytic Essay Scoring arXiv:2606.06754v1 Announce Type: cross Abstract: We present MADRAG, a training-free framework for analytic essay scoring that combines multi-agent reasoning with retrieval-augmented grounding. Unlike standard LLM-as-judge approaches, which are prone to bias and unstable… 10 arXiv — NLP / Computation & Language research 24d ago Textual Supervision Enhances Geospatial Representations in Vision-Language Models arXiv:2606.07172v1 Announce Type: cross Abstract: Geospatial understanding is a critical yet underexplored dimension in the development of machine learning systems for tasks such as image geolocation and spatial reasoning. In this work, we analyze the geospatial representations… 18 arXiv — NLP / Computation & Language research 24d ago MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism arXiv:2606.07512v1 Announce Type: cross Abstract: Current Vision-Language Models struggle with hours-long videos because processing full-length visual sequences induces prohibitive token explosion and attention dilution. To overcome this, we introduce MemDreamer to decouple… 14 arXiv — NLP / Computation & Language research 24d ago AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning arXiv:2512.13278v2 Announce Type: replace Abstract: Agentic reinforcement learning has advanced large language models (LLMs) to reason through long chain-of-thought trajectories while interleaving external tool use. Existing approaches assume a fixed inventory of tools, which… 10 arXiv — NLP / Computation & Language research 24d ago SEEK: Steering LLM Reasoning for RAG via Internal Reasoning Sketches arXiv:2601.09402v2 Announce Type: replace Abstract: Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by incorporating external knowledge into the generation process. Benefiting from the reasoning capabilities of LLMs, existing methods have leveraged… 8 arXiv — NLP / Computation & Language research 24d ago Mechanistic Evidence for Faithfulness Decay in Chain-of-Thought Reasoning arXiv:2602.11201v2 Announce Type: replace Abstract: Chain-of-Thought (CoT) explanations are widely used to interpret how language models solve complex problems, yet it remains unclear whether these step-by-step explanations reflect how the model actually reaches its answer, or… 22 Hugging Face Daily Papers research 24d ago Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators Abstract Astra is an agentic spatial reasoning framework that enhances Vision-Language Models with action-conditioned visual imagination by coupling a reinforcement learning-trained policy with a world simulator for generating novel-view observations. Generated by… 22 Hugging Face Daily Papers research 24d ago Watch, Remember, Reason: Human-View Video Understanding with MLLMs Abstract Multimodal large language models for video understanding are structured around three core capabilities—watching, remembering, and reasoning—with applications spanning multiple video domains and addressing challenges in perception, memory, and reasoning. Generated by… 8 Hugging Face Daily Papers research 24d ago WorldBench: A Challenging and Visually Diverse Multimodal Reasoning Benchmark Abstract WorldBench is introduced as a visually diverse reasoning benchmark for evaluating multimodal large language models, revealing significant limitations in current models' visual understanding capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct In real-world… 11 llama.cpp releases dev-tools 26d ago b9544 common/chat : fix LFM2/LFM2.5 reasoning round-trip and leak ( #24234 ) common/chat : fix LFM2 reasoning round-trip and stray leak Gate by reasoning format and whether the template supports macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)… 30 r/LocalLLaMA community 26d ago Z.ai, we need Air! GLM GGUF wen? First we never saw an upgraded Air model after 4.5. Then GLM 4.7 Turbo was great, but quickly surpassed for coding. Now GLM 5.1 is a coding beast, but too huge for most to run locally, and even slow on API. Will we ever get another Air model with frontier reasoning and… 23 r/LocalLLaMA community 27d ago I implemented KVarN in my llama.cpp fork and ran KLD benchmarks. It's promising! Saw this post here yesterday: KVarN: new KV-cache quant from Huawei. 3–5× KV cache compression with actual speed-up instead of slow-down, and unlike TurboQuant it holds up on reasoning (Apache 2.0, vLLM single flag) Cheap KV cache with good precision? Sign me up! Oh, vLLM… 12 Hugging Face Daily Papers research 27d ago Discrete-WAM: Unified Discrete Vision-Action Token Editing for World-Policy Learning Abstract Discrete-WAM introduces a unified discrete latent vision-action world policy that enables compositional causal reasoning and counterfactual reasoning in autonomous driving through aligned discrete tokens and a shared discrete diffusion framework. Generated by… 29 Hugging Face Daily Papers research 27d ago World-Language-Action Model for Unified World Modeling, Language Reasoning, and Action Synthesis Abstract World-language-action models combine textual instruction processing with robot state prediction through an autoregressive transformer backbone, enabling efficient long-horizon task execution and cross-embodiment learning. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We… 7 r/LocalLLaMA community 27d ago [NEW MODEL] SupraLabs just released a new model! - Supra-50M-Reasoning SupraLabs just released a new model! - Supra-50M-Reasoning Hello again r/LocalLLaMA ! Supra-50M-Reasoning (ThinkSupra-50M) is the reasoning version of Supra-50M-Instruct. It produces a full thinking chain before every answer, fine-tuned from Supra-50M-Base using a custom… 14 r/LocalLLaMA community 27d ago Benchmark & Reality Check on Gemma 4 12B: Great model, but your local settings are probably breaking it (Fix inside) I completed a Python bug hunting benchmark with Gemma 4 12B. I used the Unsloth Dynamic Q5 GGUF model. The model has good capabilities. Default settings in LM Studio disable the reasoning. Fix the LM Studio reasoning configuration. LM Studio looks for Qwen tokens. Gemma 4 uses… 30 Hugging Face Daily Papers research 27d ago Multimodal Music Recommendation System using LLMs Abstract A multimodal framework for session-based music recommendation integrates audio, lyric, and semantic signals with LLM-based sequential reasoning to improve recommendation accuracy. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Music recommendation systems typically treat… 16 arXiv — Machine Learning research 27d ago State commitment learning: training language models to distinguish computation from memory arXiv:2606.05201v1 Announce Type: new Abstract: Reasoning language models do not distinguish tokens used for computation from tokens that constitute persistent state: once generated, all hidden thoughts remain in context and influence future predictions. As a result, downstream… 19 arXiv — Machine Learning research 27d ago Policy-Conditioned Counterfactual Credit for Verifiable Reinforcement Learning of Long-Horizon Language Agents arXiv:2606.05263v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards improves reasoning and tool use, yet long-horizon language agents still learn unsupported evidence chains, belief drift, and shortcut actions that satisfy terminal checks. Existing… 5 arXiv — Machine Learning research 27d ago Selective-Advantage Entropy-Adaptive Horizon GRPO: Asymmetric Token-Level Discounting for Efficient Reinforcement Learning of Language Models arXiv:2606.05434v1 Announce Type: new Abstract: Group Relative Policy Optimisation (GRPO) has emerged as an effective reinforcement-learning algorithm for aligning language models on reasoning tasks, but it treats every token position and every sampled rollout symmetrically. We… 17 arXiv — Machine Learning research 27d ago What Objects Enable, Not What They Are: Functional Latent Spaces for Affordance Reasoning arXiv:2606.05533v1 Announce Type: new Abstract: Existing robot planning systems rely on appearance-based reasoning, where visual observations are encoded into latent spaces organized around object appearances (e.g., recognizing a "cart" based on how it looks). However, planning… 13 arXiv — Machine Learning research 27d ago Compress-Distill: Reasoning Trace Compression for Efficient Knowledge Distillation arXiv:2606.05988v1 Announce Type: new Abstract: Reasoning models produce long chain-of-thought traces that are costly to distill and encourage verbose student outputs. We study post-hoc compression of such traces before knowledge distillation. Two teachers, Qwen3.5-397B-A17B and… 30 arXiv — Machine Learning research 27d ago HoT-SSM:Higher-order Temporal Knowledge Graph Reasoning with State Space Models for Health Care arXiv:2606.05994v1 Announce Type: new Abstract: Medical knowledge graphs (MKGs) infused with clinical knowledge have been increasingly used to model electronic health records (EHRs) to support interpretable predictions in healthcare domain. However, existing MKG-based approaches… 31 Page 9 of 10 · 500 articles ← Newer Older →