Tag

Reasoning

500 articles archived under #reasoning · RSS

arXiv — NLP / Computation & Language research 14d ago

PragReST: Self-Reinforcing Counterfactual Reasoning for Pragmatic Language Understanding

arXiv:2606.18624v1 Announce Type: new Abstract: Natural language understanding often depends on meanings that are implied rather than explicitly stated, requiring pragmatic reasoning. Despite strong performance on math and logical reasoning, large language models (LLMs) still…

6
arXiv — NLP / Computation & Language research 14d ago

TW-LegalBench: Measuring Taiwanese Legal Understanding

arXiv:2606.18699v1 Announce Type: new Abstract: Large language models (LLMs) have shown impressive capabilities across diverse tasks, yet their performance on jurisdiction-specific legal reasoning remains underexplored. We present TW-LegalBench that utilizes Taiwanese legal…

22
arXiv — NLP / Computation & Language research 14d ago

Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning

arXiv:2606.18831v1 Announce Type: new Abstract: Long-context reasoning is an essential capability for large language models, particularly when they are deployed as autonomous agents that must reason over lengthy trajectories. Reinforcement learning (RL) has recently emerged as a…

36
arXiv — NLP / Computation & Language research 14d ago

ScholarSum: Student-Teacher Abstractive Summarization via Knowledge Graph Reasoning and Reflective Refinement

arXiv:2606.18850v1 Announce Type: new Abstract: Abstractive summarization plays a crucial role in enabling efficient understanding of scientific literature, yet it inherently demands both linguistic fluency and factual faithfulness. Existing approaches often fail to reconcile…

28
arXiv — NLP / Computation & Language research 14d ago

GraphPO: Graph-based Policy Optimization for Reasoning Models

arXiv:2606.18954v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has become a standard paradigm for enhancing the capability of large reasoning models. RLVR typically samples responses independently and optimizes the policy using from final…

9
arXiv — NLP / Computation & Language research 14d ago

Enhancing Multilingual Reasoning via Steerable Model Merging

arXiv:2606.19002v1 Announce Type: new Abstract: Model merging is an effective technique for composing the capabilities of a multilingual model and a reasoning model. It has achieved promising generalization in multilingual reasoning tasks by aligning feature spaces of different…

36
arXiv — NLP / Computation & Language research 14d ago

DreamReasoner-8B: Block-Size Curriculum Learning for Diffusion Reasoning Models

arXiv:2606.19257v1 Announce Type: new Abstract: Block diffusion language models accelerate decoding through parallel block-wise denoising, yet whether they can be reliably scaled for long chain-of-thought (CoT) reasoning remains unresolved. To this end, we develop…

18
arXiv — NLP / Computation & Language research 14d ago

Decoupling Search from Reasoning: A Vendor-Agnostic Grounding Architecture for LLM Agents

arXiv:2606.18947v1 Announce Type: cross Abstract: Production LLM agents increasingly depend on real-time search, yet native search grounding bundles retrieval policy, provider choice, evidence injection, cost, latency, and generation behavior behind a single model-provider…

20
arXiv — NLP / Computation & Language research 14d ago

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

arXiv:2606.19236v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards algorithms like GRPO have emerged as the dominant post-training paradigm for complex reasoning in LLMs, yet commonly suffer from policy entropy collapse during training. We conduct a…

35
arXiv — NLP / Computation & Language research 14d ago

Structured Inference with Large Language Gibbs

arXiv:2606.19264v1 Announce Type: cross Abstract: The knowledge encoded in large language models (LLMs) can serve as a substrate for structured reasoning over variables describing a complex world, but accessing this knowledge in a probabilistically coherent manner poses a…

8
arXiv — NLP / Computation & Language research 14d ago

Rethinking Reward Supervision: Rubric-Conditioned Self-Distillation

arXiv:2606.19327v1 Announce Type: cross Abstract: Post-training of reasoning language models is commonly driven by supervised distillation and reinforcement learning with verifiable rewards. Distillation often relies on chain-of-thought annotations that are expensive to obtain…

35
arXiv — NLP / Computation & Language research 14d ago

Native Active Perception as Reasoning for Omni-Modal Understanding

arXiv:2606.19341v1 Announce Type: cross Abstract: Passive models for long video understanding typically rely on a "watch-it-all" paradigm, processing frames uniformly regardless of query difficulty, causing computational cost to grow with video duration. Although interactive…

12
arXiv — NLP / Computation & Language research 14d ago

ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark

arXiv:2505.23851v3 Announce Type: replace Abstract: Large language models (LLMs) are increasingly applied to symbolic mathematics, yet existing evaluations often conflate pattern memorization with genuine reasoning. To address this gap, we present ASyMOB, a high-resolution…

38
arXiv — NLP / Computation & Language research 14d ago

UniECG: Understanding and Generating ECG in One Unified Model

arXiv:2509.18588v2 Announce Type: replace Abstract: Electrocardiogram (ECG) interpretation is a fundamental skill in medical education, yet students often need more than static examples to connect waveform evidence with diagnostic reasoning. This paper presents UniECG as a step…

38
arXiv — NLP / Computation & Language research 14d ago

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

arXiv:2603.00026v2 Announce Type: replace Abstract: Memory management is essential for LLM agents in long-term interactions. Current memory frameworks typically treat agents as passive ``recorders'' and retrieve information without understanding its deeper implications. They may…

15
Hugging Face Daily Papers research 14d ago

Guava: An Effective and Universal Harness for Embodied Manipulation

Abstract A harness framework for embodied tool use combines high-level reasoning with external modules, enabling compact models to perform complex manipulation tasks with minimal training data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Language models trained on large-scale…

15
Hugging Face Daily Papers research 14d ago

Sumi: Open Uniform Diffusion Language Model from Scratch

Abstract A large-scale uniform diffusion language model pretrained from scratch demonstrates competitive performance on knowledge and reasoning tasks while highlighting differences in commonsense reasoning compared to autoregressive models. Generated by…

15
Hugging Face Daily Papers research 15d ago

Visual-Seeker: Towards Visual-Native Multimodal Agentic Search via Active Visual Reasoning

Abstract Visual-Seeker enables visual-native multimodal deep search through active visual reasoning, outperforming proprietary models on real-world web environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multimodal large language models (MLLMs) have demonstrated…

25
arXiv — Machine Learning research 15d ago

Learning to Refine Hidden States for Reliable LLM Reasoning

arXiv:2606.17524v1 Announce Type: new Abstract: Large language models show strong reasoning ability, but their internal reasoning process can remain unstable in complex multi-step settings, where early hidden-state errors may propagate to incorrect predictions. We propose ReLAR,…

35
arXiv — Machine Learning research 15d ago

Continual Self-Improvement with Lightweight Experiential Latent Memories

arXiv:2606.17803v1 Announce Type: new Abstract: Large language models achieve strong reasoning performance by scaling inference-time compute, yet remain fundamentally stateless, discarding the rich, self-produced reasoning traces generated during this process. We investigate…

21
arXiv — Machine Learning research 15d ago

From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning

arXiv:2606.18089v1 Announce Type: new Abstract: Post-training pipelines that combine supervised fine-tuning (SFT) with reinforcement learning (RL) have emerged as the key recipe for transforming large language models (LLMs) into robust reasoners. We argue that this combined…

17
arXiv — NLP / Computation & Language research 15d ago

Decoding Hidden Deception in Reasoning LLMs: Activation Explainers for Deception Auditing

arXiv:2606.17478v1 Announce Type: new Abstract: As LLMs acquire stronger reasoning capabilities, deceptive behavior becomes an increasingly serious safety concern. Existing deception monitors either score visible transcripts or derive scalar probe scores from representation…

23
arXiv — NLP / Computation & Language research 15d ago

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

arXiv:2606.17682v1 Announce Type: new Abstract: Reinforcement learning pipelines for Large Language Model (LLM) training often rely on manually redesigned environments between stages, requiring practitioners to heuristically infer which configuration will best improve the…

28
arXiv — NLP / Computation & Language research 15d ago

SuCo: Sufficiency-guided Continuous Adaptive Reasoning

arXiv:2606.17687v1 Announce Type: new Abstract: Despite remarkable performance on complex tasks, Large Reasoning Models (LRMs) often generate excessively long Chain-of-Thoughts (CoT), inflating computational costs even for simple queries. Existing efforts to mitigate this…

21
arXiv — NLP / Computation & Language research 15d ago

Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning Models

arXiv:2606.17890v1 Announce Type: new Abstract: Long-form chain-of-thought reasoning can improve LLM performance on complex tasks, but models often continue generating unnecessary reasoning after a correct answer has emerged. We refer to this behavior as overthinking. We study…

29
arXiv — NLP / Computation & Language research 15d ago

ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions

arXiv:2606.17905v1 Announce Type: new Abstract: Large language models perform increasingly well on standardized logical reasoning benchmarks, but whether this ability remains robust beyond English is unclear. We introduce ChLogic, an English--Chinese aligned benchmark that tests…

10
arXiv — NLP / Computation & Language research 15d ago

Visuals Lie, Consistency Speaks: Disentangling Spatial Attention from Reliability in Vision-Language Models

arXiv:2606.17389v1 Announce Type: cross Abstract: Multimodal Foundation Models are increasingly used as reasoning agents, making reliability, knowing when a model may hallucinate, critical. A common intuition, which we call the Attention-Confidence Assumption, holds that…

24
arXiv — NLP / Computation & Language research 15d ago

The Measurement Gap in the Automation of EU Law: Benchmarking Doctrinal Legal Reasoning under the EU AI Act

arXiv:2606.18158v1 Announce Type: cross Abstract: Large language models now produce legal text of at least median quality, yet no existing benchmark can evaluate whether they perform doctrinal legal reasoning, which forms the interpretive core of legal work, rather than the…

38
arXiv — NLP / Computation & Language research 15d ago

MedicalAgentsBench for Complex Medical Reasoning: Comparing Internalized Reasoning Models versus Externalized Agent-based Frameworks

arXiv:2503.07459v3 Announce Type: replace Abstract: Complex medical reasoning requires integrating heterogeneous clinical evidence across multiple inference steps. Large language models (LLMs) now approach this through two routes: internalized reasoning and externalized agent…

19
arXiv — NLP / Computation & Language research 15d ago

Adaptive Activation Steering for Efficient LLM Reasoning via Closed-Loop PID Control

arXiv:2506.18831v3 Announce Type: replace Abstract: Reasoning LLMs trained with long chain-of-thought often overthink: they spend tokens on redundant reflection and transitions that inflate cost without improving accuracy. Static activation steering (e.g.\ SEAL) suppresses such…

6
arXiv — NLP / Computation & Language research 15d ago

EngTrace: A Symbolic Benchmark for Verifiable Process Supervision of Engineering Reasoning

arXiv:2511.01650v3 Announce Type: replace Abstract: Large Language Models (LLMs) are increasingly entering specialized, safety-critical engineering workflows governed by strict quantitative standards and immutable physical laws, making rigorous evaluation of their reasoning…

38
arXiv — NLP / Computation & Language research 15d ago

Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning

arXiv:2601.03872v2 Announce Type: replace Abstract: The integration of large language models (LLMs) with external tools has significantly expanded the capabilities of AI agents. However, as the diversity of both LLMs and tools increases, selecting the optimal model-tool…

27
Hugging Face Daily Papers research 15d ago

ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions

Abstract ChLogic benchmark reveals persistent performance gaps between English and Chinese logical reasoning in large language models, influenced by surface realization differences and translation artifacts. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models…

37
Hugging Face Daily Papers research 15d ago

TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs

Abstract A framework called TRIAGE is proposed to improve clinical early warning systems by training large language models to generate dialectical reasoning for continuous risk scoring with better calibration and interpretability. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

29
r/LocalLLaMA community 15d ago

“Wait,” in reasoning models makes my eye twitch

I get that it helps, I know why they do it, but it’s still annoying as hell lol   submitted by   /u/Borkato [link]   [comments]

11
r/LocalLLaMA community 15d ago

GLM-5.2 just dropped open weights and it already looks weirdly strong for coding

GLM-5.2 just released and the early numbers look pretty insane. 1M context window, open weights, MIT license, two reasoning effort modes, and it is already showing up near the top of coding arenas. I know every new model gets hyped for 24 hours, but this one actually looks worth…

28
Hugging Face Daily Papers research 15d ago

ExpRL: Exploratory RL for LLM Mid-Training

Abstract ExpRL uses human-written question-answer data as reward scaffolds to provide automated reinforcement learning priming for language models, outperforming traditional methods on math reasoning tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Sparse reward reinforcement…

23
r/LocalLLaMA community 16d ago

Scaling former VibeThinker-1.5B to 3B — now it reaches frontier math & coding performance

https://preview.redd.it/obgodr9dfn7h1.png?width=1796&format=png&auto=webp&s=b5fd95e2b7e6f8ed7704e3de66778e970d34a1dd We trained VibeThinker-3B to test how far verifiable reasoning can be pushed in a strict small-model regime. It gets 94.3 on AIME'26, 80.2 on LiveCodeBench v6,…

36
Hugging Face Daily Papers research 16d ago

Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale

Abstract Ling-2.6 and Ring-2.6 models are presented as scalable solutions for agentic intelligence, featuring architectural upgrades and specialized training methods to balance fast response times with advanced reasoning capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

34
r/LocalLLaMA community 16d ago

Gemma 12b - Reasoning hardening instructions

I've become quite happy with Gemma 12b QAT as a general assistant lately. It is small enough to run on my PC while still leave plenty of VRAM free for other tasks and fast enough that I I don't have to go make coffee while it thinks. I saw someone on youtube throwing trick…

36
Hugging Face Daily Papers research 16d ago

Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

Abstract Prompt-Level Distillation extracts reasoning patterns from teacher models to enhance student model performance while maintaining interpretability and reducing latency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Advanced reasoning typically requires Chain-of-Thought…

18
Smol AI News news-outlet 16d ago

GLM 5.2: the top Frontend Coding model in the world, IndexShare reduces costs

**Z.ai released GLM-5.2**, an MIT-licensed open-weight frontier model targeting **coding and long-horizon agentic tasks** with a **1M-token context window** and **two reasoning-effort modes**. It features a **744B-parameter mixture-of-experts architecture** with **40B active…

14
Hugging Face Daily Papers research 16d ago

Implicit Reasoning for Large Language Model-based Generative Recommendation

Abstract Large Language Models for generative recommendation face challenges with semantic IDs disrupting natural-language reasoning, prompting a lightweight implicit reasoning approach that outperforms explicit methods while reducing computational costs. Generated by…

16
arXiv — Machine Learning research 16d ago

Beyond Accuracy: Measuring Bias Acknowledgment in Chain-of-Thought Reasoning for Responsible AI Evaluation

arXiv:2606.15127v1 Announce Type: new Abstract: Reasoning models are increasingly used in settings where the final answer is not the only object of review: educational tools may show students intermediate steps, decision-support systems may require human oversight, and audit…

11
arXiv — Machine Learning research 16d ago

Semantic Reasoning in Medicine: The Role of Knowledge Graphs Across Five Key Domains

arXiv:2606.15155v1 Announce Type: new Abstract: Knowledge graphs (KGs) have emerged as a promising solution for integrating and reasoning over complex biomedical and clinical data in healthcare. By representing structured relationships among entities such as diseases, drugs,…

17
arXiv — Machine Learning research 16d ago

Understanding Diversity Collapse in RLVR via the Lens of Overtraining

arXiv:2606.15455v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has become a key approach for enhancing the reasoning abilities of large language models. However, RLVR often suffers from \emph{diversity collapse}: Pass@$1$ improves while…

6
arXiv — Machine Learning research 16d ago

Localizing Credit at the Divergence: Path-Conditioned Self-Distillation for LLM Reasoning

arXiv:2606.15576v1 Announce Type: new Abstract: Reinforcement learning from verifiable rewards assigns a single scalar to each rollout, leaving token-level credit assignment underspecified in long reasoning traces. On-policy self-distillation addresses this by letting the same…

6
arXiv — Machine Learning research 16d ago

Is Code Better Than Language for Algorithmic Reasoning

arXiv:2606.15589v1 Announce Type: new Abstract: For tool-augmented language models, comparing natural-language reasoning with code-execution pipelines is difficult because the comparison changes both the intermediate representation and the execution mechanism. We separate these…

29
arXiv — Machine Learning research 16d ago

Formalizing and Mitigating Structural Distortion in LLM Attention for Zero-Shot Graph Reasoning

arXiv:2606.15633v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown promise for reasoning over Text-Attributed Graphs (TAGs). However, applying LLMs to graphs requires linearizing their structure into sequences, introducing distortion rooted in the graph…

9
arXiv — Machine Learning research 16d ago

ReQAT: Achieving Full-Precision Reasoning Accuracy with 4-bit Floating-Point Quantization-Aware Training

arXiv:2606.15682v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) achieve strong problem-solving through long chain-of-thought, but their deployment is constrained by the high cost of full-precision inference and growing KV cache footprints. Microscaled FP4 formats…

35

PragReST: Self-Reinforcing Counterfactual Reasoning for Pragmatic Language Understanding

TW-LegalBench: Measuring Taiwanese Legal Understanding

Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning

ScholarSum: Student-Teacher Abstractive Summarization via Knowledge Graph Reasoning and Reflective Refinement

GraphPO: Graph-based Policy Optimization for Reasoning Models

Enhancing Multilingual Reasoning via Steerable Model Merging

DreamReasoner-8B: Block-Size Curriculum Learning for Diffusion Reasoning Models

Decoupling Search from Reasoning: A Vendor-Agnostic Grounding Architecture for LLM Agents

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

Structured Inference with Large Language Gibbs

Rethinking Reward Supervision: Rubric-Conditioned Self-Distillation

Native Active Perception as Reasoning for Omni-Modal Understanding

ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark

UniECG: Understanding and Generating ECG in One Unified Model

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

Guava: An Effective and Universal Harness for Embodied Manipulation

Sumi: Open Uniform Diffusion Language Model from Scratch

Visual-Seeker: Towards Visual-Native Multimodal Agentic Search via Active Visual Reasoning

Learning to Refine Hidden States for Reliable LLM Reasoning

Continual Self-Improvement with Lightweight Experiential Latent Memories

From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning

Decoding Hidden Deception in Reasoning LLMs: Activation Explainers for Deception Auditing

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

SuCo: Sufficiency-guided Continuous Adaptive Reasoning

Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning Models

ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions

Visuals Lie, Consistency Speaks: Disentangling Spatial Attention from Reliability in Vision-Language Models

The Measurement Gap in the Automation of EU Law: Benchmarking Doctrinal Legal Reasoning under the EU AI Act

MedicalAgentsBench for Complex Medical Reasoning: Comparing Internalized Reasoning Models versus Externalized Agent-based Frameworks

Adaptive Activation Steering for Efficient LLM Reasoning via Closed-Loop PID Control

EngTrace: A Symbolic Benchmark for Verifiable Process Supervision of Engineering Reasoning

Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning

ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions

TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs

“Wait,” in reasoning models makes my eye twitch

GLM-5.2 just dropped open weights and it already looks weirdly strong for coding

ExpRL: Exploratory RL for LLM Mid-Training

Scaling former VibeThinker-1.5B to 3B — now it reaches frontier math & coding performance

Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale

Gemma 12b - Reasoning hardening instructions

Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

GLM 5.2: the top Frontend Coding model in the world, IndexShare reduces costs

Implicit Reasoning for Large Language Model-based Generative Recommendation

Beyond Accuracy: Measuring Bias Acknowledgment in Chain-of-Thought Reasoning for Responsible AI Evaluation

Semantic Reasoning in Medicine: The Role of Knowledge Graphs Across Five Key Domains

Understanding Diversity Collapse in RLVR via the Lens of Overtraining

Localizing Credit at the Divergence: Path-Conditioned Self-Distillation for LLM Reasoning

Is Code Better Than Language for Algorithmic Reasoning

Formalizing and Mitigating Structural Distortion in LLM Attention for Zero-Shot Graph Reasoning

ReQAT: Achieving Full-Precision Reasoning Accuracy with 4-bit Floating-Point Quantization-Aware Training