News / #reasoning Tag Reasoning 500 articles archived under #reasoning · RSS Sign in to follow Hugging Face Daily Papers research 9d ago Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation Abstract Trajectory-Augmented Policy Optimization (TAPO) enhances large language model reasoning by creating explicit corrective trajectories that preserve erroneous reasoning while incorporating natural-language diagnoses and corrections, outperforming traditional… 31 Hugging Face Daily Papers research 9d ago Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models Abstract Reinforcement learning approaches for improving LLM reasoning capabilities are enhanced by a Bayesian Manifold Curriculum framework that structures problem sampling based on task manifold relationships and endogenous non-stationarity. Generated by… 20 Hacker News — AI on Front Page community 9d ago VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO Article URL: https://arxiv.org/abs/2606.16140 Comments URL: https://news.ycombinator.com/item?id=48639240 Points: 211 # Comments: 85 26 r/LocalLLaMA community 9d ago NEX-N2-mini: "There is no Pareto frontier. I am Pareto". This Qwen3.5-MoE fine tune fixed 3.5 and 3.6 overthinking apparently on my tests. I have been testing all popular MoE for my Mac and it seems I just found gold: 3.5/3.6 level of reasoning (if not slightly superior) at a fraction of the reasoning tokens used (wasted). Dynamic plot with other benchmarks here: https://benchmark-yourself.streamlit.app/… 4 Hugging Face Daily Papers research 10d ago Multi-Turn Reflective Masking Elicits Reasoning in Mask Diffusion Models Abstract Reflective Masking enables iterative local refinement in Mask Diffusion Models through lightweight post-training, supporting multi-turn reasoning without architectural changes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While reasoning on autoregressive (AR) models is… 26 r/LocalLLaMA community 11d ago 8-16 MI50s Minimax M3 @19 tps TG (peak) TL;DR Speeds are not too ugly for this old 2018 hardware but imo, not very usable for agentic coding (if you compare with qwen3.6 27B on 8 MI50 @ 50 tps TG 800 tps PP). More concerning is that the reasoning output is very very long and still didn’t check about the quality of… 27 r/LocalLLaMA community 12d ago GLM 5.2: 98% of max level intelligence with less than half of tokens usage According to this number of reasoning tokens from GLM 5.1 to GLM 5.2 more than doubled from 16.7k to 36.7k and for me as a local user with old junk Xeon setup this makes GLM 5.2 unusable to the extent where I had to shut down model after 12h of waiting it to respond to my math… 37 r/LocalLLaMA community 12d ago How do I set the right llama.cpp parameters? --n-gpu-layers all --ctx-size 0 --reasoning-budget 0 --presence-penalty 1.1 --repeat-penalty 1.1 How do I figure out the optimal llama.cpp parameters for my setup? llama.cpp + Open WebUI in Docker with an AMD GPU (16GB VRAM) running gemma 4 12b and 26b models. Is it all about… 13 Hugging Face Daily Papers research 12d ago Context-Aware RL for Agentic and Multimodal LLMs Abstract ContextRL enhances long-horizon reasoning and multimodal performance through reinforcement learning that rewards context selection for supporting query-answer pairs, achieving improvements over standard methods on diverse benchmarks. Generated by… 21 r/LocalLLaMA community 13d ago Watching a local AI voice assistant get dumber (A 9B to 0.8B agent experiment on my RTX 5060 Ti) I wanted to find the exact floor for running an intelligent, local voice assistant agent on consumer hardware. I kept the environment, tools, and prompts identical, I stepped the model sizes down through Qwen 3.5 9B, 4B, 2B, and 0.8B to see how agentic reasoning degrades. The… 12 r/LocalLLaMA community 13d ago Has anyone here used VibeThinker-3B outside benchmarks? Just curious, given the hype and benchmark numbers. Curious about real-world behavior: debugging, coding assistance, reasoning over messy prompts, local latency, failure modes, and whether it actually feels useful versus just optimized for verifiable evals.… 23 arXiv — NLP / Computation & Language research 13d ago Thermodynamic Signatures of Reasoning: Free-Energy and Spectral-Form-Factor Diagnostics for Hallucination Detection in Large Language Models arXiv:2606.19404v1 Announce Type: cross Abstract: Hallucination detection in large language models (LLMs) is deployment-critical, and recent work shows that the spectrum of attention-derived graph Laplacians carries strong signal about reasoning quality. Prior spectral… 15 arXiv — Machine Learning research 13d ago Concept Flow Models: Anchoring Concept-Based Reasoning with Hierarchical Bottlenecks arXiv:2606.19489v1 Announce Type: new Abstract: Concept Bottleneck Models (CBMs) enhance interpretability by projecting learned features into a human-understandable concept space. Recent approaches leverage vision-language models to generate concept embeddings, reducing the need… 8 arXiv — Machine Learning research 13d ago Hard or Just Unreached? Diagnosing the Sampling Blind Spot in Math-Reasoning Difficulty Estimation arXiv:2606.19636v1 Announce Type: new Abstract: Math and science reasoning benchmarks rely on pass@k, the fraction of sampled chains that reach gold, as the canonical per-example difficulty signal. The same signal drives RL with verifiable rewards, math data curation, synthetic… 20 arXiv — NLP / Computation & Language research 13d ago Efficiently Representing Algorithms With Chain-of-Thought Transformers arXiv:2606.19697v1 Announce Type: cross Abstract: The increasing popularity of \emph{reasoning} models -- language models that output a series of reasoning or thought tokens before producing an answer -- is justified, in part, by theoretical results showing that chain-of-thought… 9 arXiv — NLP / Computation & Language research 13d ago Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models arXiv:2606.19750v1 Announce Type: cross Abstract: Reinforcement learning (RL) is a central approach for improving reasoning capabilities in large language models (LLMs), where training efficiency depends critically on how problems are sampled during optimization. Existing… 15 arXiv — Machine Learning research 13d ago ADaPT: Token-Level Decoupling for Efficient Large Reasoning Models arXiv:2606.19919v1 Announce Type: new Abstract: Large reasoning models rely on long chain-of-thought to achieve strong performance, but applying such reasoning uniformly incurs high computational cost. Existing efficiency-oriented methods attempt to shorten or mix reasoning… 11 arXiv — Machine Learning research 13d ago VIMPO: Value-Implicit Policy Optimization for LLMs arXiv:2606.20008v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards has become a central tool for improving the reasoning ability of large language models, but current methods face a trade-off between simplicity and credit assignment. Group-relative… 6 arXiv — NLP / Computation & Language research 13d ago What Makes Effective Supervision in Latent Chain-of-Thought: An Information-Theoretic Analysis arXiv:2606.20075v1 Announce Type: cross Abstract: Latent Chain-of-Thought (CoT) internalizes reasoning within continuous hidden states, offering a promising alternative to verbose discrete reasoning traces. However, robust latent reasoning remains difficult because outcome… 36 arXiv — NLP / Computation & Language research 13d ago Pruning via Causal Attribution Preserves Reasoning Performance in Large Language Models arXiv:2606.19350v1 Announce Type: new Abstract: Large language models (LLMs) excel at multi-step reasoning but incur substantial inference cost. We introduce Causal Attribution Pruning (CAP), a training-free method that identifies critical attention heads by measuring their… 34 arXiv — NLP / Computation & Language research 13d ago Detecting Hallucinations for Large Language Model-based Knowledge Graph Reasoning arXiv:2606.19351v1 Announce Type: new Abstract: Knowledge graph (KG) reasoning infers new knowledge from existing facts and is widely applied in question answering, recommendation, and decision support. With the rapid development of large language models (LLMs), LLM-based KG… 25 arXiv — NLP / Computation & Language research 13d ago Granularity-Regulated Adaptive Computational Efficiency for Optimal Verification in Test-Time Scaling arXiv:2606.19354v1 Announce Type: new Abstract: Test-time scaling (TTS) has emerged as a powerful paradigm for improving the reasoning performance of large language models (LLMs) by investing additional compute at inference time. A central component of TTS is the… 5 arXiv — NLP / Computation & Language research 13d ago Where Does Social Reasoning Come From? Capability Provenance in Language Models arXiv:2606.19625v1 Announce Type: new Abstract: We use training-data attribution as an interpretable tool for capability discovery, mapping which regions of the pretraining corpus support social-reasoning versus STEM-reasoning in OLMo3-7B. Training-data attribution measures how… 9 arXiv — NLP / Computation & Language research 13d ago Clusters are All You Need: Pre-Training the Tsetlin Machine with Semantic Clusters from Language Models for Interpretability arXiv:2606.19815v1 Announce Type: new Abstract: Pre-trained language models such as BERT achieve strong text classification performance but lack transparency, limiting their use in high-stakes settings. The Tsetlin Machine (TM) offers fully interpretable, clause-based reasoning… 25 arXiv — NLP / Computation & Language research 13d ago AtomMem: Building Simple and Effective Memory System for LLM Agents via Atomic Facts arXiv:2606.19847v1 Announce Type: new Abstract: Large language models (LLMs) demonstrate strong reasoning and generation abilities, but their fixed context windows limit long-term information accumulation and reuse across multi-session interactions. Existing memory-augmented… 32 arXiv — NLP / Computation & Language research 13d ago GEMS: Geometric Constraints Enable Multi-Semantic Superposition in LLMs arXiv:2606.19946v1 Announce Type: new Abstract: Activation steering controls model behavior by modifying intermediate hidden states at inference time without retraining. Existing methods handle only single-direction injection; when multiple semantic directions are superposed… 16 arXiv — NLP / Computation & Language research 13d ago MedRLM: Recursive Multimodal Health Intelligence for Long-Context Clinical Reasoning, Sensor-Guided Screening, Evidence-Grounded Decision Support, and Community-to-Tertiary Referral Optimization arXiv:2606.20164v1 Announce Type: new Abstract: Real-world clinical decision support requires reasoning over heterogeneous and longitudinal patient information rather than answering isolated medical questions. However, current medical large language models and… 29 arXiv — NLP / Computation & Language research 13d ago Think Again or Think Longer? Selective Verification for Budget-Aware Reasoning arXiv:2606.19808v1 Announce Type: cross Abstract: Test-time reasoning is increasingly used as a serving-time control knob, but extra reasoning is not uniformly valuable: it can repair failed attempts, waste compute on already-correct answers, or introduce harmful answer changes.… 25 arXiv — NLP / Computation & Language research 13d ago Med-R2: Perception and Reflection-driven Complex Reasoning for Medical Report Generation arXiv:2504.02885v2 Announce Type: replace Abstract: Automated medical report generation (MRG) is increasingly used to reduce the burden of manual reporting and for decision support. Large vision-language models (LVLMs) hold great promise for automated MRG due to their… 29 Hugging Face Daily Papers research 13d ago S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence Abstract S-Agent is a spatial reasoning framework that enhances visual language models with temporal memory and hierarchical spatial tools to enable continuous 3D world understanding from multi-view imagery. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Real-world spatial… 28 Hugging Face Daily Papers research 13d ago Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance Abstract A lightweight image inpainting framework achieves high-fidelity results with significantly reduced parameters and inference time through novel local-global interaction blocks and adaptive distillation strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While… 35 Hugging Face Daily Papers research 13d ago Thinking with Visual Grounding Abstract Visually grounded thinking integrates natural-language reasoning with explicit visual evidence grounding in vision-language models, improving reasoning accuracy through scalable synthesis and reinforcement learning techniques. Generated by… 34 Hugging Face Daily Papers research 13d ago REVES: REvision and VErification--Augmented Training for Test-Time Scaling Abstract A two-stage iterative framework alternates between data augmentation and policy optimization to improve LLM reasoning by leveraging intermediate correction steps, achieving superior performance on coding benchmarks and constraint satisfaction problems. Generated by… 23 TechCrunch — AI news-outlet 13d ago General Intuition in talks to raise $300M at around $2B valuation General Intuition is in talks to raise around $300 million at a roughly $2 billion valuation from backers including Jeff Bezos. The startup trains AI agents on spatial-temporal reasoning. 14 Hugging Face Daily Papers research 14d ago From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning Abstract A framework automates environment redesign in reinforcement learning for large language models by having the policy analyze failures and suggest configuration changes, achieving superior performance over larger proprietary models and fixed-environment baselines.… 6 OpenAI official-blog 14d ago Improving health intelligence in ChatGPT Learn how GPT-5.5 Instant improves ChatGPT’s health and wellness responses with stronger reasoning, better context, clearer communication, and physician-informed evaluations. 7 OpenAI official-blog 14d ago Using AI to help physicians diagnose rare genetic diseases affecting children Researchers used an OpenAI reasoning model to help diagnose rare diseases, identifying 18 new diagnoses in previously unsolved cases. 17 Hugging Face Daily Papers research 14d ago SciOrch: Learning to Orchestrate Expert LLMs for Solving Frontier Multimodal Scientific Reasoning Tasks Abstract SciOrch is a framework that uses a lightweight orchestrator model to coordinate multiple frontier LLMs for scientific reasoning, achieving superior performance through MCTS-based training and GRPO-style optimization while reducing API costs. Generated by… 31 Hugging Face Daily Papers research 14d ago Native Active Perception as Reasoning for Omni-Modal Understanding Abstract OmniAgent is a novel omni-modal agent that addresses long video understanding by using an iterative observation-thought-action cycle with active perception, achieving superior performance compared to larger models through efficient selective processing. Generated by… 24 Hugging Face Daily Papers research 14d ago Reinforcing Dual-Path Reasoning in Spatial Vision Language Models Abstract A unified framework for spatial vision-language models that combines linguistic deduction and 3D geometric reasoning through reinforcement learning, enabling robust spatial reasoning across diverse tasks and domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Spatial… 9 Hugging Face Daily Papers research 14d ago Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation Abstract ViGOS is a visually grounded on-policy self-distillation framework for multimodal large language models that improves image-grounded behavior by using specialized teachers for different stages of reasoning and handling invalid rollouts. Generated by… 8 arXiv — NLP / Computation & Language research 14d ago Breaking the Solver Bottleneck: Training Task Generators at the Learnable Frontier arXiv:2606.18284v1 Announce Type: cross Abstract: The limiting resource for training agents via reinforcement learning (RL) is increasingly frontier task supply: valid, solvable tasks just difficult enough to train the current model. As reasoning and agentic models improve,… 21 arXiv — Machine Learning research 14d ago Sparsity Curse: Understanding RLVR Model Parameter Space from Model Merging arXiv:2606.18521v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Reward (RLVR) has emerged as a powerful post-training paradigm that surpasses Supervised Fine-Tuning (SFT) in eliciting reasoning intelligence and resisting catastrophic forgetting. Recent… 13 arXiv — Machine Learning research 14d ago Learning from Own Solutions: Self-Conditioned Credit Assignment for Reinforcement Learning with Verifiable Rewards arXiv:2606.18810v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has driven substantial progress in training LLMs for reasoning tasks, but representative methods such as GRPO assign uniform credit across all tokens, wasting gradient on… 11 arXiv — Machine Learning research 14d ago Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation arXiv:2606.18844v1 Announce Type: new Abstract: Self-distillation improves reasoning in large language models by using the model's own rollouts as training signal, typically through implicit logit-level alignment that minimizes KL divergence toward a privileged target… 14 arXiv — NLP / Computation & Language research 14d ago REVES: REvision and VErification--Augmented Training for Test-Time Scaling arXiv:2606.18910v1 Announce Type: cross Abstract: Test-time scaling via sequential revision has emerged as a powerful paradigm for enhancing Large Language Model (LLM) reasoning. However, standard post-training methods primarily optimize single-shot objectives, creating a… 27 arXiv — Machine Learning research 14d ago EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts arXiv:2606.18967v1 Announce Type: new Abstract: Reinforcement learning (RL) has become a representative post-training paradigm for LLMs, enabling strong reasoning and agentic capabilities. However, rollout generation remains a dominant latency bottleneck because autoregressive… 25 arXiv — Machine Learning research 14d ago Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation arXiv:2606.19120v1 Announce Type: new Abstract: On-policy self-distillation (OPSD) trains a model on its own rollouts and uses a frozen copy to provide dense token-level targets conditioned on a reference target. This works well for LLM reasoning, but a direct extension to… 31 arXiv — NLP / Computation & Language research 14d ago LLM Parameters for Math Across Languages: Shared or Separate? arXiv:2606.18453v1 Announce Type: new Abstract: Large language models (LLMs) exhibit substantial cross-lingual variation in mathematical reasoning performance, but it remains unclear whether these differences reflect language-specific parameters or a shared mechanism that… 27 arXiv — NLP / Computation & Language research 14d ago Towards Scalable Customization and Deployment of Multi-Agent Systems for Enterprise Applications arXiv:2606.18502v1 Announce Type: new Abstract: Large language model (LLM)-based multi-agent systems demonstrate strong performance on complex reasoning and task execution, enabling broad enterprise applications. However, production deployment remains challenging due to… 38 Page 4 of 10 · 500 articles ← Newer Older →