News / #reasoning Tag Reasoning 500 articles archived under #reasoning · RSS Sign in to follow arXiv — NLP / Computation & Language research 14d ago PragReST: Self-Reinforcing Counterfactual Reasoning for Pragmatic Language Understanding arXiv:2606.18624v1 Announce Type: new Abstract: Natural language understanding often depends on meanings that are implied rather than explicitly stated, requiring pragmatic reasoning. Despite strong performance on math and logical reasoning, large language models (LLMs) still… 6 arXiv — NLP / Computation & Language research 14d ago TW-LegalBench: Measuring Taiwanese Legal Understanding arXiv:2606.18699v1 Announce Type: new Abstract: Large language models (LLMs) have shown impressive capabilities across diverse tasks, yet their performance on jurisdiction-specific legal reasoning remains underexplored. We present TW-LegalBench that utilizes Taiwanese legal… 22 arXiv — NLP / Computation & Language research 14d ago Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning arXiv:2606.18831v1 Announce Type: new Abstract: Long-context reasoning is an essential capability for large language models, particularly when they are deployed as autonomous agents that must reason over lengthy trajectories. Reinforcement learning (RL) has recently emerged as a… 36 arXiv — NLP / Computation & Language research 14d ago ScholarSum: Student-Teacher Abstractive Summarization via Knowledge Graph Reasoning and Reflective Refinement arXiv:2606.18850v1 Announce Type: new Abstract: Abstractive summarization plays a crucial role in enabling efficient understanding of scientific literature, yet it inherently demands both linguistic fluency and factual faithfulness. Existing approaches often fail to reconcile… 28 arXiv — NLP / Computation & Language research 14d ago GraphPO: Graph-based Policy Optimization for Reasoning Models arXiv:2606.18954v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has become a standard paradigm for enhancing the capability of large reasoning models. RLVR typically samples responses independently and optimizes the policy using from final… 9 arXiv — NLP / Computation & Language research 14d ago Enhancing Multilingual Reasoning via Steerable Model Merging arXiv:2606.19002v1 Announce Type: new Abstract: Model merging is an effective technique for composing the capabilities of a multilingual model and a reasoning model. It has achieved promising generalization in multilingual reasoning tasks by aligning feature spaces of different… 36 arXiv — NLP / Computation & Language research 14d ago DreamReasoner-8B: Block-Size Curriculum Learning for Diffusion Reasoning Models arXiv:2606.19257v1 Announce Type: new Abstract: Block diffusion language models accelerate decoding through parallel block-wise denoising, yet whether they can be reliably scaled for long chain-of-thought (CoT) reasoning remains unresolved. To this end, we develop… 18 arXiv — NLP / Computation & Language research 14d ago Decoupling Search from Reasoning: A Vendor-Agnostic Grounding Architecture for LLM Agents arXiv:2606.18947v1 Announce Type: cross Abstract: Production LLM agents increasingly depend on real-time search, yet native search grounding bundles retrieval policy, provider choice, evidence injection, cost, latency, and generation behavior behind a single model-provider… 20 arXiv — NLP / Computation & Language research 14d ago STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability arXiv:2606.19236v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards algorithms like GRPO have emerged as the dominant post-training paradigm for complex reasoning in LLMs, yet commonly suffer from policy entropy collapse during training. We conduct a… 35 arXiv — NLP / Computation & Language research 14d ago Structured Inference with Large Language Gibbs arXiv:2606.19264v1 Announce Type: cross Abstract: The knowledge encoded in large language models (LLMs) can serve as a substrate for structured reasoning over variables describing a complex world, but accessing this knowledge in a probabilistically coherent manner poses a… 8 arXiv — NLP / Computation & Language research 14d ago Rethinking Reward Supervision: Rubric-Conditioned Self-Distillation arXiv:2606.19327v1 Announce Type: cross Abstract: Post-training of reasoning language models is commonly driven by supervised distillation and reinforcement learning with verifiable rewards. Distillation often relies on chain-of-thought annotations that are expensive to obtain… 35 arXiv — NLP / Computation & Language research 14d ago Native Active Perception as Reasoning for Omni-Modal Understanding arXiv:2606.19341v1 Announce Type: cross Abstract: Passive models for long video understanding typically rely on a "watch-it-all" paradigm, processing frames uniformly regardless of query difficulty, causing computational cost to grow with video duration. Although interactive… 12 arXiv — NLP / Computation & Language research 14d ago ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark arXiv:2505.23851v3 Announce Type: replace Abstract: Large language models (LLMs) are increasingly applied to symbolic mathematics, yet existing evaluations often conflate pattern memorization with genuine reasoning. To address this gap, we present ASyMOB, a high-resolution… 38 arXiv — NLP / Computation & Language research 14d ago UniECG: Understanding and Generating ECG in One Unified Model arXiv:2509.18588v2 Announce Type: replace Abstract: Electrocardiogram (ECG) interpretation is a fundamental skill in medical education, yet students often need more than static examples to connect waveform evidence with diagnostic reasoning. This paper presents UniECG as a step… 38 arXiv — NLP / Computation & Language research 14d ago ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents arXiv:2603.00026v2 Announce Type: replace Abstract: Memory management is essential for LLM agents in long-term interactions. Current memory frameworks typically treat agents as passive ``recorders'' and retrieve information without understanding its deeper implications. They may… 15 Hugging Face Daily Papers research 14d ago Guava: An Effective and Universal Harness for Embodied Manipulation Abstract A harness framework for embodied tool use combines high-level reasoning with external modules, enabling compact models to perform complex manipulation tasks with minimal training data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Language models trained on large-scale… 15 Hugging Face Daily Papers research 14d ago Sumi: Open Uniform Diffusion Language Model from Scratch Abstract A large-scale uniform diffusion language model pretrained from scratch demonstrates competitive performance on knowledge and reasoning tasks while highlighting differences in commonsense reasoning compared to autoregressive models. Generated by… 15 Hugging Face Daily Papers research 15d ago Visual-Seeker: Towards Visual-Native Multimodal Agentic Search via Active Visual Reasoning Abstract Visual-Seeker enables visual-native multimodal deep search through active visual reasoning, outperforming proprietary models on real-world web environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multimodal large language models (MLLMs) have demonstrated… 25 arXiv — Machine Learning research 15d ago Learning to Refine Hidden States for Reliable LLM Reasoning arXiv:2606.17524v1 Announce Type: new Abstract: Large language models show strong reasoning ability, but their internal reasoning process can remain unstable in complex multi-step settings, where early hidden-state errors may propagate to incorrect predictions. We propose ReLAR,… 35 arXiv — Machine Learning research 15d ago Continual Self-Improvement with Lightweight Experiential Latent Memories arXiv:2606.17803v1 Announce Type: new Abstract: Large language models achieve strong reasoning performance by scaling inference-time compute, yet remain fundamentally stateless, discarding the rich, self-produced reasoning traces generated during this process. We investigate… 21 arXiv — Machine Learning research 15d ago From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning arXiv:2606.18089v1 Announce Type: new Abstract: Post-training pipelines that combine supervised fine-tuning (SFT) with reinforcement learning (RL) have emerged as the key recipe for transforming large language models (LLMs) into robust reasoners. We argue that this combined… 17 arXiv — NLP / Computation & Language research 15d ago Decoding Hidden Deception in Reasoning LLMs: Activation Explainers for Deception Auditing arXiv:2606.17478v1 Announce Type: new Abstract: As LLMs acquire stronger reasoning capabilities, deceptive behavior becomes an increasingly serious safety concern. Existing deception monitors either score visible transcripts or derive scalar probe scores from representation… 23 arXiv — NLP / Computation & Language research 15d ago From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning arXiv:2606.17682v1 Announce Type: new Abstract: Reinforcement learning pipelines for Large Language Model (LLM) training often rely on manually redesigned environments between stages, requiring practitioners to heuristically infer which configuration will best improve the… 28 arXiv — NLP / Computation & Language research 15d ago SuCo: Sufficiency-guided Continuous Adaptive Reasoning arXiv:2606.17687v1 Announce Type: new Abstract: Despite remarkable performance on complex tasks, Large Reasoning Models (LRMs) often generate excessively long Chain-of-Thoughts (CoT), inflating computational costs even for simple queries. Existing efforts to mitigate this… 21 arXiv — NLP / Computation & Language research 15d ago Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning Models arXiv:2606.17890v1 Announce Type: new Abstract: Long-form chain-of-thought reasoning can improve LLM performance on complex tasks, but models often continue generating unnecessary reasoning after a correct answer has emerged. We refer to this behavior as overthinking. We study… 29 arXiv — NLP / Computation & Language research 15d ago ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions arXiv:2606.17905v1 Announce Type: new Abstract: Large language models perform increasingly well on standardized logical reasoning benchmarks, but whether this ability remains robust beyond English is unclear. We introduce ChLogic, an English--Chinese aligned benchmark that tests… 10 arXiv — NLP / Computation & Language research 15d ago Visuals Lie, Consistency Speaks: Disentangling Spatial Attention from Reliability in Vision-Language Models arXiv:2606.17389v1 Announce Type: cross Abstract: Multimodal Foundation Models are increasingly used as reasoning agents, making reliability, knowing when a model may hallucinate, critical. A common intuition, which we call the Attention-Confidence Assumption, holds that… 24 arXiv — NLP / Computation & Language research 15d ago The Measurement Gap in the Automation of EU Law: Benchmarking Doctrinal Legal Reasoning under the EU AI Act arXiv:2606.18158v1 Announce Type: cross Abstract: Large language models now produce legal text of at least median quality, yet no existing benchmark can evaluate whether they perform doctrinal legal reasoning, which forms the interpretive core of legal work, rather than the… 38 arXiv — NLP / Computation & Language research 15d ago MedicalAgentsBench for Complex Medical Reasoning: Comparing Internalized Reasoning Models versus Externalized Agent-based Frameworks arXiv:2503.07459v3 Announce Type: replace Abstract: Complex medical reasoning requires integrating heterogeneous clinical evidence across multiple inference steps. Large language models (LLMs) now approach this through two routes: internalized reasoning and externalized agent… 19 arXiv — NLP / Computation & Language research 15d ago Adaptive Activation Steering for Efficient LLM Reasoning via Closed-Loop PID Control arXiv:2506.18831v3 Announce Type: replace Abstract: Reasoning LLMs trained with long chain-of-thought often overthink: they spend tokens on redundant reflection and transitions that inflate cost without improving accuracy. Static activation steering (e.g.\ SEAL) suppresses such… 6 arXiv — NLP / Computation & Language research 15d ago EngTrace: A Symbolic Benchmark for Verifiable Process Supervision of Engineering Reasoning arXiv:2511.01650v3 Announce Type: replace Abstract: Large Language Models (LLMs) are increasingly entering specialized, safety-critical engineering workflows governed by strict quantitative standards and immutable physical laws, making rigorous evaluation of their reasoning… 38 arXiv — NLP / Computation & Language research 15d ago Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning arXiv:2601.03872v2 Announce Type: replace Abstract: The integration of large language models (LLMs) with external tools has significantly expanded the capabilities of AI agents. However, as the diversity of both LLMs and tools increases, selecting the optimal model-tool… 27 Hugging Face Daily Papers research 15d ago ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions Abstract ChLogic benchmark reveals persistent performance gaps between English and Chinese logical reasoning in large language models, influenced by surface realization differences and translation artifacts. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models… 37 Hugging Face Daily Papers research 15d ago TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs Abstract A framework called TRIAGE is proposed to improve clinical early warning systems by training large language models to generate dialectical reasoning for continuous risk scoring with better calibration and interpretability. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 29 r/LocalLLaMA community 15d ago “Wait,” in reasoning models makes my eye twitch I get that it helps, I know why they do it, but it’s still annoying as hell lol   submitted by   /u/Borkato [link]   [comments] 11 r/LocalLLaMA community 15d ago GLM-5.2 just dropped open weights and it already looks weirdly strong for coding GLM-5.2 just released and the early numbers look pretty insane. 1M context window, open weights, MIT license, two reasoning effort modes, and it is already showing up near the top of coding arenas. I know every new model gets hyped for 24 hours, but this one actually looks worth… 28 Hugging Face Daily Papers research 15d ago ExpRL: Exploratory RL for LLM Mid-Training Abstract ExpRL uses human-written question-answer data as reward scaffolds to provide automated reinforcement learning priming for language models, outperforming traditional methods on math reasoning tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Sparse reward reinforcement… 23 r/LocalLLaMA community 16d ago Scaling former VibeThinker-1.5B to 3B — now it reaches frontier math & coding performance https://preview.redd.it/obgodr9dfn7h1.png?width=1796&format=png&auto=webp&s=b5fd95e2b7e6f8ed7704e3de66778e970d34a1dd We trained VibeThinker-3B to test how far verifiable reasoning can be pushed in a strict small-model regime. It gets 94.3 on AIME'26, 80.2 on LiveCodeBench v6,… 36 Hugging Face Daily Papers research 16d ago Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale Abstract Ling-2.6 and Ring-2.6 models are presented as scalable solutions for agentic intelligence, featuring architectural upgrades and specialized training methods to balance fast response times with advanced reasoning capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 34 r/LocalLLaMA community 16d ago Gemma 12b - Reasoning hardening instructions I've become quite happy with Gemma 12b QAT as a general assistant lately. It is small enough to run on my PC while still leave plenty of VRAM free for other tasks and fast enough that I I don't have to go make coffee while it thinks. I saw someone on youtube throwing trick… 36 Hugging Face Daily Papers research 16d ago Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning Abstract Prompt-Level Distillation extracts reasoning patterns from teacher models to enhance student model performance while maintaining interpretability and reducing latency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Advanced reasoning typically requires Chain-of-Thought… 18 Smol AI News news-outlet 16d ago GLM 5.2: the top Frontend Coding model in the world, IndexShare reduces costs **Z.ai released GLM-5.2**, an MIT-licensed open-weight frontier model targeting **coding and long-horizon agentic tasks** with a **1M-token context window** and **two reasoning-effort modes**. It features a **744B-parameter mixture-of-experts architecture** with **40B active… 14 Hugging Face Daily Papers research 16d ago Implicit Reasoning for Large Language Model-based Generative Recommendation Abstract Large Language Models for generative recommendation face challenges with semantic IDs disrupting natural-language reasoning, prompting a lightweight implicit reasoning approach that outperforms explicit methods while reducing computational costs. Generated by… 16 arXiv — Machine Learning research 16d ago Beyond Accuracy: Measuring Bias Acknowledgment in Chain-of-Thought Reasoning for Responsible AI Evaluation arXiv:2606.15127v1 Announce Type: new Abstract: Reasoning models are increasingly used in settings where the final answer is not the only object of review: educational tools may show students intermediate steps, decision-support systems may require human oversight, and audit… 11 arXiv — Machine Learning research 16d ago Semantic Reasoning in Medicine: The Role of Knowledge Graphs Across Five Key Domains arXiv:2606.15155v1 Announce Type: new Abstract: Knowledge graphs (KGs) have emerged as a promising solution for integrating and reasoning over complex biomedical and clinical data in healthcare. By representing structured relationships among entities such as diseases, drugs,… 17 arXiv — Machine Learning research 16d ago Understanding Diversity Collapse in RLVR via the Lens of Overtraining arXiv:2606.15455v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has become a key approach for enhancing the reasoning abilities of large language models. However, RLVR often suffers from \emph{diversity collapse}: Pass@$1$ improves while… 6 arXiv — Machine Learning research 16d ago Localizing Credit at the Divergence: Path-Conditioned Self-Distillation for LLM Reasoning arXiv:2606.15576v1 Announce Type: new Abstract: Reinforcement learning from verifiable rewards assigns a single scalar to each rollout, leaving token-level credit assignment underspecified in long reasoning traces. On-policy self-distillation addresses this by letting the same… 6 arXiv — Machine Learning research 16d ago Is Code Better Than Language for Algorithmic Reasoning arXiv:2606.15589v1 Announce Type: new Abstract: For tool-augmented language models, comparing natural-language reasoning with code-execution pipelines is difficult because the comparison changes both the intermediate representation and the execution mechanism. We separate these… 29 arXiv — Machine Learning research 16d ago Formalizing and Mitigating Structural Distortion in LLM Attention for Zero-Shot Graph Reasoning arXiv:2606.15633v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown promise for reasoning over Text-Attributed Graphs (TAGs). However, applying LLMs to graphs requires linearizing their structure into sequences, introducing distortion rooted in the graph… 9 arXiv — Machine Learning research 16d ago ReQAT: Achieving Full-Precision Reasoning Accuracy with 4-bit Floating-Point Quantization-Aware Training arXiv:2606.15682v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) achieve strong problem-solving through long chain-of-thought, but their deployment is constrained by the high cost of full-precision inference and growing KV cache footprints. Microscaled FP4 formats… 35 Page 5 of 10 · 500 articles ← Newer Older →