Tag

Reasoning

500 articles archived under #reasoning · RSS

Hugging Face Daily Papers research 9d ago

Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation

Abstract Trajectory-Augmented Policy Optimization (TAPO) enhances large language model reasoning by creating explicit corrective trajectories that preserve erroneous reasoning while incorporating natural-language diagnoses and corrections, outperforming traditional…

31
Hugging Face Daily Papers research 9d ago

Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models

Abstract Reinforcement learning approaches for improving LLM reasoning capabilities are enhanced by a Bayesian Manifold Curriculum framework that structures problem sampling based on task manifold relationships and endogenous non-stationarity. Generated by…

20
Hacker News — AI on Front Page community 9d ago

VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO

Article URL: https://arxiv.org/abs/2606.16140 Comments URL: https://news.ycombinator.com/item?id=48639240 Points: 211 # Comments: 85

26
r/LocalLLaMA community 9d ago

NEX-N2-mini: "There is no Pareto frontier. I am Pareto". This Qwen3.5-MoE fine tune fixed 3.5 and 3.6 overthinking apparently on my tests.

I have been testing all popular MoE for my Mac and it seems I just found gold: 3.5/3.6 level of reasoning (if not slightly superior) at a fraction of the reasoning tokens used (wasted). Dynamic plot with other benchmarks here: https://benchmark-yourself.streamlit.app/…

4
Hugging Face Daily Papers research 10d ago

Multi-Turn Reflective Masking Elicits Reasoning in Mask Diffusion Models

Abstract Reflective Masking enables iterative local refinement in Mask Diffusion Models through lightweight post-training, supporting multi-turn reasoning without architectural changes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While reasoning on autoregressive (AR) models is…

26
r/LocalLLaMA community 11d ago

8-16 MI50s Minimax M3 @19 tps TG (peak)

TL;DR Speeds are not too ugly for this old 2018 hardware but imo, not very usable for agentic coding (if you compare with qwen3.6 27B on 8 MI50 @ 50 tps TG 800 tps PP). More concerning is that the reasoning output is very very long and still didn’t check about the quality of…

27
r/LocalLLaMA community 12d ago

GLM 5.2: 98% of max level intelligence with less than half of tokens usage

According to this number of reasoning tokens from GLM 5.1 to GLM 5.2 more than doubled from 16.7k to 36.7k and for me as a local user with old junk Xeon setup this makes GLM 5.2 unusable to the extent where I had to shut down model after 12h of waiting it to respond to my math…

37
r/LocalLLaMA community 12d ago

How do I set the right llama.cpp parameters?

--n-gpu-layers all --ctx-size 0 --reasoning-budget 0 --presence-penalty 1.1 --repeat-penalty 1.1 How do I figure out the optimal llama.cpp parameters for my setup? llama.cpp + Open WebUI in Docker with an AMD GPU (16GB VRAM) running gemma 4 12b and 26b models. Is it all about…

13
Hugging Face Daily Papers research 12d ago

Context-Aware RL for Agentic and Multimodal LLMs

Abstract ContextRL enhances long-horizon reasoning and multimodal performance through reinforcement learning that rewards context selection for supporting query-answer pairs, achieving improvements over standard methods on diverse benchmarks. Generated by…

21
r/LocalLLaMA community 13d ago

Watching a local AI voice assistant get dumber (A 9B to 0.8B agent experiment on my RTX 5060 Ti)

I wanted to find the exact floor for running an intelligent, local voice assistant agent on consumer hardware. I kept the environment, tools, and prompts identical, I stepped the model sizes down through Qwen 3.5 9B, 4B, 2B, and 0.8B to see how agentic reasoning degrades. The…

12
r/LocalLLaMA community 13d ago

Has anyone here used VibeThinker-3B outside benchmarks?

Just curious, given the hype and benchmark numbers. Curious about real-world behavior: debugging, coding assistance, reasoning over messy prompts, local latency, failure modes, and whether it actually feels useful versus just optimized for verifiable evals.…

23
arXiv — NLP / Computation & Language research 13d ago

Thermodynamic Signatures of Reasoning: Free-Energy and Spectral-Form-Factor Diagnostics for Hallucination Detection in Large Language Models

arXiv:2606.19404v1 Announce Type: cross Abstract: Hallucination detection in large language models (LLMs) is deployment-critical, and recent work shows that the spectrum of attention-derived graph Laplacians carries strong signal about reasoning quality. Prior spectral…

15
arXiv — Machine Learning research 13d ago

Concept Flow Models: Anchoring Concept-Based Reasoning with Hierarchical Bottlenecks

arXiv:2606.19489v1 Announce Type: new Abstract: Concept Bottleneck Models (CBMs) enhance interpretability by projecting learned features into a human-understandable concept space. Recent approaches leverage vision-language models to generate concept embeddings, reducing the need…

8
arXiv — Machine Learning research 13d ago

Hard or Just Unreached? Diagnosing the Sampling Blind Spot in Math-Reasoning Difficulty Estimation

arXiv:2606.19636v1 Announce Type: new Abstract: Math and science reasoning benchmarks rely on pass@k, the fraction of sampled chains that reach gold, as the canonical per-example difficulty signal. The same signal drives RL with verifiable rewards, math data curation, synthetic…

20
arXiv — NLP / Computation & Language research 13d ago

Efficiently Representing Algorithms With Chain-of-Thought Transformers

arXiv:2606.19697v1 Announce Type: cross Abstract: The increasing popularity of \emph{reasoning} models -- language models that output a series of reasoning or thought tokens before producing an answer -- is justified, in part, by theoretical results showing that chain-of-thought…

9
arXiv — NLP / Computation & Language research 13d ago

Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models

arXiv:2606.19750v1 Announce Type: cross Abstract: Reinforcement learning (RL) is a central approach for improving reasoning capabilities in large language models (LLMs), where training efficiency depends critically on how problems are sampled during optimization. Existing…

15
arXiv — Machine Learning research 13d ago

ADaPT: Token-Level Decoupling for Efficient Large Reasoning Models

arXiv:2606.19919v1 Announce Type: new Abstract: Large reasoning models rely on long chain-of-thought to achieve strong performance, but applying such reasoning uniformly incurs high computational cost. Existing efficiency-oriented methods attempt to shorten or mix reasoning…

11
arXiv — Machine Learning research 13d ago

VIMPO: Value-Implicit Policy Optimization for LLMs

arXiv:2606.20008v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards has become a central tool for improving the reasoning ability of large language models, but current methods face a trade-off between simplicity and credit assignment. Group-relative…

6
arXiv — NLP / Computation & Language research 13d ago

What Makes Effective Supervision in Latent Chain-of-Thought: An Information-Theoretic Analysis

arXiv:2606.20075v1 Announce Type: cross Abstract: Latent Chain-of-Thought (CoT) internalizes reasoning within continuous hidden states, offering a promising alternative to verbose discrete reasoning traces. However, robust latent reasoning remains difficult because outcome…

36
arXiv — NLP / Computation & Language research 13d ago

Pruning via Causal Attribution Preserves Reasoning Performance in Large Language Models

arXiv:2606.19350v1 Announce Type: new Abstract: Large language models (LLMs) excel at multi-step reasoning but incur substantial inference cost. We introduce Causal Attribution Pruning (CAP), a training-free method that identifies critical attention heads by measuring their…

34
arXiv — NLP / Computation & Language research 13d ago

Detecting Hallucinations for Large Language Model-based Knowledge Graph Reasoning

arXiv:2606.19351v1 Announce Type: new Abstract: Knowledge graph (KG) reasoning infers new knowledge from existing facts and is widely applied in question answering, recommendation, and decision support. With the rapid development of large language models (LLMs), LLM-based KG…

25
arXiv — NLP / Computation & Language research 13d ago

Granularity-Regulated Adaptive Computational Efficiency for Optimal Verification in Test-Time Scaling

arXiv:2606.19354v1 Announce Type: new Abstract: Test-time scaling (TTS) has emerged as a powerful paradigm for improving the reasoning performance of large language models (LLMs) by investing additional compute at inference time. A central component of TTS is the…

5
arXiv — NLP / Computation & Language research 13d ago

Where Does Social Reasoning Come From? Capability Provenance in Language Models

arXiv:2606.19625v1 Announce Type: new Abstract: We use training-data attribution as an interpretable tool for capability discovery, mapping which regions of the pretraining corpus support social-reasoning versus STEM-reasoning in OLMo3-7B. Training-data attribution measures how…

9
arXiv — NLP / Computation & Language research 13d ago

Clusters are All You Need: Pre-Training the Tsetlin Machine with Semantic Clusters from Language Models for Interpretability

arXiv:2606.19815v1 Announce Type: new Abstract: Pre-trained language models such as BERT achieve strong text classification performance but lack transparency, limiting their use in high-stakes settings. The Tsetlin Machine (TM) offers fully interpretable, clause-based reasoning…

25
arXiv — NLP / Computation & Language research 13d ago

AtomMem: Building Simple and Effective Memory System for LLM Agents via Atomic Facts

arXiv:2606.19847v1 Announce Type: new Abstract: Large language models (LLMs) demonstrate strong reasoning and generation abilities, but their fixed context windows limit long-term information accumulation and reuse across multi-session interactions. Existing memory-augmented…

32
arXiv — NLP / Computation & Language research 13d ago

GEMS: Geometric Constraints Enable Multi-Semantic Superposition in LLMs

arXiv:2606.19946v1 Announce Type: new Abstract: Activation steering controls model behavior by modifying intermediate hidden states at inference time without retraining. Existing methods handle only single-direction injection; when multiple semantic directions are superposed…

16
arXiv — NLP / Computation & Language research 13d ago

MedRLM: Recursive Multimodal Health Intelligence for Long-Context Clinical Reasoning, Sensor-Guided Screening, Evidence-Grounded Decision Support, and Community-to-Tertiary Referral Optimization

arXiv:2606.20164v1 Announce Type: new Abstract: Real-world clinical decision support requires reasoning over heterogeneous and longitudinal patient information rather than answering isolated medical questions. However, current medical large language models and…

29
arXiv — NLP / Computation & Language research 13d ago

Think Again or Think Longer? Selective Verification for Budget-Aware Reasoning

arXiv:2606.19808v1 Announce Type: cross Abstract: Test-time reasoning is increasingly used as a serving-time control knob, but extra reasoning is not uniformly valuable: it can repair failed attempts, waste compute on already-correct answers, or introduce harmful answer changes.…

25
arXiv — NLP / Computation & Language research 13d ago

Med-R2: Perception and Reflection-driven Complex Reasoning for Medical Report Generation

arXiv:2504.02885v2 Announce Type: replace Abstract: Automated medical report generation (MRG) is increasingly used to reduce the burden of manual reporting and for decision support. Large vision-language models (LVLMs) hold great promise for automated MRG due to their…

29
Hugging Face Daily Papers research 13d ago

S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence

Abstract S-Agent is a spatial reasoning framework that enhances visual language models with temporal memory and hierarchical spatial tools to enable continuous 3D world understanding from multi-view imagery. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Real-world spatial…

28
Hugging Face Daily Papers research 13d ago

Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance

Abstract A lightweight image inpainting framework achieves high-fidelity results with significantly reduced parameters and inference time through novel local-global interaction blocks and adaptive distillation strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While…

35
Hugging Face Daily Papers research 13d ago

Thinking with Visual Grounding

Abstract Visually grounded thinking integrates natural-language reasoning with explicit visual evidence grounding in vision-language models, improving reasoning accuracy through scalable synthesis and reinforcement learning techniques. Generated by…

34
Hugging Face Daily Papers research 13d ago

REVES: REvision and VErification--Augmented Training for Test-Time Scaling

Abstract A two-stage iterative framework alternates between data augmentation and policy optimization to improve LLM reasoning by leveraging intermediate correction steps, achieving superior performance on coding benchmarks and constraint satisfaction problems. Generated by…

23
TechCrunch — AI news-outlet 13d ago

General Intuition in talks to raise $300M at around $2B valuation

General Intuition is in talks to raise around $300 million at a roughly $2 billion valuation from backers including Jeff Bezos. The startup trains AI agents on spatial-temporal reasoning.

14
Hugging Face Daily Papers research 14d ago

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

Abstract A framework automates environment redesign in reinforcement learning for large language models by having the policy analyze failures and suggest configuration changes, achieving superior performance over larger proprietary models and fixed-environment baselines.…

6
OpenAI official-blog 14d ago

Improving health intelligence in ChatGPT

Learn how GPT-5.5 Instant improves ChatGPT’s health and wellness responses with stronger reasoning, better context, clearer communication, and physician-informed evaluations.

7
OpenAI official-blog 14d ago

Using AI to help physicians diagnose rare genetic diseases affecting children

Researchers used an OpenAI reasoning model to help diagnose rare diseases, identifying 18 new diagnoses in previously unsolved cases.

17
Hugging Face Daily Papers research 14d ago

SciOrch: Learning to Orchestrate Expert LLMs for Solving Frontier Multimodal Scientific Reasoning Tasks

Abstract SciOrch is a framework that uses a lightweight orchestrator model to coordinate multiple frontier LLMs for scientific reasoning, achieving superior performance through MCTS-based training and GRPO-style optimization while reducing API costs. Generated by…

31
Hugging Face Daily Papers research 14d ago

Native Active Perception as Reasoning for Omni-Modal Understanding

Abstract OmniAgent is a novel omni-modal agent that addresses long video understanding by using an iterative observation-thought-action cycle with active perception, achieving superior performance compared to larger models through efficient selective processing. Generated by…

24
Hugging Face Daily Papers research 14d ago

Reinforcing Dual-Path Reasoning in Spatial Vision Language Models

Abstract A unified framework for spatial vision-language models that combines linguistic deduction and 3D geometric reasoning through reinforcement learning, enabling robust spatial reasoning across diverse tasks and domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Spatial…

9
Hugging Face Daily Papers research 14d ago

Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation

Abstract ViGOS is a visually grounded on-policy self-distillation framework for multimodal large language models that improves image-grounded behavior by using specialized teachers for different stages of reasoning and handling invalid rollouts. Generated by…

8
arXiv — NLP / Computation & Language research 14d ago

Breaking the Solver Bottleneck: Training Task Generators at the Learnable Frontier

arXiv:2606.18284v1 Announce Type: cross Abstract: The limiting resource for training agents via reinforcement learning (RL) is increasingly frontier task supply: valid, solvable tasks just difficult enough to train the current model. As reasoning and agentic models improve,…

21
arXiv — Machine Learning research 14d ago

Sparsity Curse: Understanding RLVR Model Parameter Space from Model Merging

arXiv:2606.18521v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Reward (RLVR) has emerged as a powerful post-training paradigm that surpasses Supervised Fine-Tuning (SFT) in eliciting reasoning intelligence and resisting catastrophic forgetting. Recent…

13
arXiv — Machine Learning research 14d ago

Learning from Own Solutions: Self-Conditioned Credit Assignment for Reinforcement Learning with Verifiable Rewards

arXiv:2606.18810v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has driven substantial progress in training LLMs for reasoning tasks, but representative methods such as GRPO assign uniform credit across all tokens, wasting gradient on…

11
arXiv — Machine Learning research 14d ago

Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation

arXiv:2606.18844v1 Announce Type: new Abstract: Self-distillation improves reasoning in large language models by using the model's own rollouts as training signal, typically through implicit logit-level alignment that minimizes KL divergence toward a privileged target…

14
arXiv — NLP / Computation & Language research 14d ago

REVES: REvision and VErification--Augmented Training for Test-Time Scaling

arXiv:2606.18910v1 Announce Type: cross Abstract: Test-time scaling via sequential revision has emerged as a powerful paradigm for enhancing Large Language Model (LLM) reasoning. However, standard post-training methods primarily optimize single-shot objectives, creating a…

27
arXiv — Machine Learning research 14d ago

EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts

arXiv:2606.18967v1 Announce Type: new Abstract: Reinforcement learning (RL) has become a representative post-training paradigm for LLMs, enabling strong reasoning and agentic capabilities. However, rollout generation remains a dominant latency bottleneck because autoregressive…

25
arXiv — Machine Learning research 14d ago

Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation

arXiv:2606.19120v1 Announce Type: new Abstract: On-policy self-distillation (OPSD) trains a model on its own rollouts and uses a frozen copy to provide dense token-level targets conditioned on a reference target. This works well for LLM reasoning, but a direct extension to…

31
arXiv — NLP / Computation & Language research 14d ago

LLM Parameters for Math Across Languages: Shared or Separate?

arXiv:2606.18453v1 Announce Type: new Abstract: Large language models (LLMs) exhibit substantial cross-lingual variation in mathematical reasoning performance, but it remains unclear whether these differences reflect language-specific parameters or a shared mechanism that…

27
arXiv — NLP / Computation & Language research 14d ago

Towards Scalable Customization and Deployment of Multi-Agent Systems for Enterprise Applications

arXiv:2606.18502v1 Announce Type: new Abstract: Large language model (LLM)-based multi-agent systems demonstrate strong performance on complex reasoning and task execution, enabling broad enterprise applications. However, production deployment remains challenging due to…

38

Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation

Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models

VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO

NEX-N2-mini: "There is no Pareto frontier. I am Pareto". This Qwen3.5-MoE fine tune fixed 3.5 and 3.6 overthinking apparently on my tests.

Multi-Turn Reflective Masking Elicits Reasoning in Mask Diffusion Models

8-16 MI50s Minimax M3 @19 tps TG (peak)

GLM 5.2: 98% of max level intelligence with less than half of tokens usage

How do I set the right llama.cpp parameters?

Context-Aware RL for Agentic and Multimodal LLMs

Watching a local AI voice assistant get dumber (A 9B to 0.8B agent experiment on my RTX 5060 Ti)

Has anyone here used VibeThinker-3B outside benchmarks?

Thermodynamic Signatures of Reasoning: Free-Energy and Spectral-Form-Factor Diagnostics for Hallucination Detection in Large Language Models

Concept Flow Models: Anchoring Concept-Based Reasoning with Hierarchical Bottlenecks

Hard or Just Unreached? Diagnosing the Sampling Blind Spot in Math-Reasoning Difficulty Estimation

Efficiently Representing Algorithms With Chain-of-Thought Transformers

Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models

ADaPT: Token-Level Decoupling for Efficient Large Reasoning Models

VIMPO: Value-Implicit Policy Optimization for LLMs

What Makes Effective Supervision in Latent Chain-of-Thought: An Information-Theoretic Analysis

Pruning via Causal Attribution Preserves Reasoning Performance in Large Language Models

Detecting Hallucinations for Large Language Model-based Knowledge Graph Reasoning

Granularity-Regulated Adaptive Computational Efficiency for Optimal Verification in Test-Time Scaling

Where Does Social Reasoning Come From? Capability Provenance in Language Models

Clusters are All You Need: Pre-Training the Tsetlin Machine with Semantic Clusters from Language Models for Interpretability

AtomMem: Building Simple and Effective Memory System for LLM Agents via Atomic Facts

GEMS: Geometric Constraints Enable Multi-Semantic Superposition in LLMs

MedRLM: Recursive Multimodal Health Intelligence for Long-Context Clinical Reasoning, Sensor-Guided Screening, Evidence-Grounded Decision Support, and Community-to-Tertiary Referral Optimization

Think Again or Think Longer? Selective Verification for Budget-Aware Reasoning

Med-R2: Perception and Reflection-driven Complex Reasoning for Medical Report Generation

S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence

Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance

Thinking with Visual Grounding

REVES: REvision and VErification--Augmented Training for Test-Time Scaling

General Intuition in talks to raise $300M at around $2B valuation

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

Improving health intelligence in ChatGPT

Using AI to help physicians diagnose rare genetic diseases affecting children

SciOrch: Learning to Orchestrate Expert LLMs for Solving Frontier Multimodal Scientific Reasoning Tasks

Native Active Perception as Reasoning for Omni-Modal Understanding

Reinforcing Dual-Path Reasoning in Spatial Vision Language Models

Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation

Breaking the Solver Bottleneck: Training Task Generators at the Learnable Frontier

Sparsity Curse: Understanding RLVR Model Parameter Space from Model Merging

Learning from Own Solutions: Self-Conditioned Credit Assignment for Reinforcement Learning with Verifiable Rewards

Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation

REVES: REvision and VErification--Augmented Training for Test-Time Scaling

EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts

Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation

LLM Parameters for Math Across Languages: Shared or Separate?

Towards Scalable Customization and Deployment of Multi-Agent Systems for Enterprise Applications