Hugging Face Daily Papers

500 articles archived · Visit source ↗ · RSS

Hugging Face Daily Papers research 20d ago

VIA-SD: Verification via Intra-Model Routing for Speculative Decoding

Abstract VIA-SD introduces a multi-tier speculative decoding framework that uses intra-model routing to reduce verification costs by employing slim submodels for medium-confidence token validation, achieving significant speedups over traditional approaches. Generated by…

32
Hugging Face Daily Papers research 20d ago

TreeSeeker: Tree-Structured Trial, Error, and Return in Deep Search

Abstract TreeSeeker is an inference-time framework that uses tree-structured search with branch-and-return control to manage exploration and exploitation in deep search tasks, improving performance through systematic trial-and-error decision making. Generated by…

23
Hugging Face Daily Papers research 20d ago

Flash-GMM: A Memory-Efficient Kernel for Scalable Soft Clustering

Abstract Flash-GMM introduces an efficient fused Triton kernel for Gaussian Mixture Models that achieves significant speedup and enables processing much larger datasets on a single GPU. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present Flash-GMM, a fused Triton kernel for…

18
Hugging Face Daily Papers research 20d ago

Leveraging Morphology for Historical Script Metrological Analysis

Abstract A transformer-based architecture with prototype learning enables scalable paleographic measurements from historical documents using only line-level transcriptions, demonstrating its effectiveness on a 160-page codex with minimal training data requirements. Generated by…

37
Hugging Face Daily Papers research 20d ago

PianoKontext: Expressive Performance Rendering from Deadpan Context

Abstract PianoKontext generates variable-length piano performances by aligning MIDI scores with audio in latent space using DTW and DiT blocks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Expressive performance rendering (EPR) aims to generate realistic performances constrained…

12
Hugging Face Daily Papers research 20d ago

IDEAL: In-DEpth ALignment Makes A Discrete Representation AutoEncoder

Abstract Representation autoencoders using deep learning frameworks can improve image reconstruction quality by combining shallow and deep visual feature representations for better semantic richness and visual fidelity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Built on…

31
Hugging Face Daily Papers research 20d ago

High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation

Abstract A 2-step image generation model is developed through distillation from an 8-step teacher using distribution-aligned adversarial learning, step-decoupled parameterization, and end-to-end training with iterative regularization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

33
Hugging Face Daily Papers research 20d ago

MiniMax Sparse Attention

Abstract MiniMax Sparse Attention enables efficient processing of ultra-long contexts in large language models through blockwise sparsity and optimized GPU execution, achieving significant speedups while maintaining performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

20
Hugging Face Daily Papers research 20d ago

VideoMDM: Towards 3D Human Motion Generation From 2D Supervision

Abstract VideoMDM trains 3D human motion priors from 2D poses using a diffusion framework with 2D reprojection loss and 3D motion regularizers, achieving near-3D supervised performance without requiring 3D ground truth. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We introduce…

5
Hugging Face Daily Papers research 20d ago

LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories

Abstract LabVLA, a vision-language-action model trained with a two-stage approach combining action token pretraining and flow matching, demonstrates superior performance on laboratory automation tasks through simulated data generation and robot-specific learning. Generated by…

18
Hugging Face Daily Papers research 20d ago

Visual Para-Thinker++: A Single-Policy Multi-Agent Framework for Visual Reasoning

Abstract A multi-agent framework with shared MLLM policy and role-specific training methods improves visual reasoning by reducing hallucinations and enabling efficient parallel processing. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Visual reasoning requires integrating…

6
Hugging Face Daily Papers research 20d ago

SG-OPD: Sign-Gated On-Policy Distillation via Sign-Consistency Gating and Phased Teacher Sampling

Abstract Sign-Gated On-Policy Distillation improves upon standard on-policy distillation by incorporating a binary verifier to filter teacher signals, resulting in better performance on mathematical reasoning tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct On-policy…

38
Hugging Face Daily Papers research 20d ago

Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?

Abstract Robust-U1 enhances multimodal large language models' robustness against visual corruptions through self-recovery capabilities that improve both visual quality and reasoning performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multimodal Large Language Models…

4
Hugging Face Daily Papers research 20d ago

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

Abstract EvoBrowseComp is an evolving benchmark with 800 contamination-free questions synthesized through a three-agent framework that ensures temporal freshness and prevents parametric memorization in search agent evaluation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Search…

26
Hugging Face Daily Papers research 20d ago

MaskAlign: Token-Subset Representation Alignment for Efficient Diffusion Training

Abstract Token-subset representation alignment method called MaskAlign improves diffusion transformer training by reducing reliance on complete token sets and maintaining stable alignment behavior under perturbations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Representation…

12
Hugging Face Daily Papers research 20d ago

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

Abstract EvoArena benchmark and EvoMem memory paradigm address the challenge of dynamic environments in LLM agents by modeling progressive updates and structured memory evolution, showing improved performance on evolving tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large…

5
Hugging Face Daily Papers research 20d ago

MuJoCo-Drones-Gym: A GPU-Accelerated Multi-Drone Simulator for Control and Reinforcement Learning

Abstract A Gymnasium-compatible multi-drone simulation environment built on MuJoCo physics engine that supports flexible physics models, action interfaces, and observation spaces for reinforcement learning applications. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Robotic…

35
Hugging Face Daily Papers research 20d ago

Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning

Abstract A switchable latent reasoning framework uses explicit boundary tokens to enable trainable and interpretable latent reasoning through recurrent hidden states. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Latent chain-of-thought compresses reasoning by replacing visible…

24
Hugging Face Daily Papers research 20d ago

Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

Abstract Evoflux enables compact language models to execute tool workflows more reliably by using evolutionary search to repair failed plans during inference, significantly improving execution feasibility compared to traditional fine-tuning methods. Generated by…

20
Hugging Face Daily Papers research 20d ago

WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces

Abstract WeaveBench presents a comprehensive benchmark for evaluating computer-use agents across multiple interfaces, revealing significant challenges in long-horizon task orchestration and highlighting the limitations of traditional performance assessment methods. Generated by…

38
Hugging Face Daily Papers research 20d ago

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

Abstract Environment engineering enhances autonomous scientific discovery by designing structured agent environments that optimize behaviors like exploration and collaboration while mitigating issues such as reward hacking and human oversight friction, as demonstrated by the…

35
Hugging Face Daily Papers research 20d ago

InterleaveThinker: Reinforcing Agentic Interleaved Generation

Abstract InterleaveThinker enables interleaved generation capabilities for image generators through a multi-agent pipeline with planner and critic agents, achieving performance comparable to state-of-the-art models while enhancing reasoning benchmarks. Generated by…

36
Hugging Face Daily Papers research 20d ago

FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents

Abstract A framework for creating shortcut-resistant training data for deep search agents by identifying and mitigating four shortcut risks in data synthesis processes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Training deep search agents requires verifiable questions whose…

11
Hugging Face Daily Papers research 20d ago

N-GRPO: Embedding-Level Neighbor Mixing for Enhanced Policy Optimization

Abstract N-GRPO, a novel exploration strategy within GRPO framework, enhances mathematical reasoning in large language models through semantic neighbor mixing that maintains semantic consistency while injecting diversity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The success…

27
Hugging Face Daily Papers research 20d ago

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Abstract SpatialClaw is a training-free framework that uses code as an action interface to enable flexible, stateful spatial reasoning in vision-language models, achieving superior performance across diverse 3D/4D spatial reasoning tasks. Generated by…

36
Hugging Face Daily Papers research 20d ago

MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold

Abstract MoVerse generates real-time interactive video from single images by creating 360° panoramas and 3D Gaussian scaffolds, enabling efficient rendering through diffusion-based techniques. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present MoVerse, a real-time video…

22
Hugging Face Daily Papers research 20d ago

HarnessBridge: Learnable Bidirectional Controller for LLM Agent Harness

Abstract Learnable harness controller called HarnessBridge is introduced to parameterize agent-environment interfaces through bidirectional projections, achieving performance comparable to specialized harnesses with reduced computational overhead. Generated by…

21
Hugging Face Daily Papers research 20d ago

Can Generalist Agents Automate Data Curation?

Abstract Automated data curation using generalist coding agents shows promise but requires structured scaffolding to achieve superior performance compared to traditional methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Curating training data is among the most consequential…

33
Hugging Face Daily Papers research 21d ago

Building Social World Models with Large Language Models

Abstract Social World Model framework captures evolution of social beliefs in response to events through temporal pattern mining and evidence lower bound optimization without explicit human annotations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Understanding and predicting…

33
Hugging Face Daily Papers research 21d ago

Which Models Are Our Models Built On? Auditing Invisible Dependencies in Modern LLMs

Abstract ModSleuth is an agentic system that recursively reconstructs large-scale dependency graphs for LLM development by analyzing public artifacts and resolving inconsistencies in documentation and artifact identities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Modern LLM…

6
Hugging Face Daily Papers research 21d ago

ReVision: Scaling Computer-Use Agents via Temporal Visual Redundancy Reduction

Abstract ReVision improves computer-use agent efficiency by removing redundant visual patches from consecutive screenshots while preserving spatial structure, reducing token usage by 46% and improving success rates. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Computer-use…

10
Hugging Face Daily Papers research 21d ago

SparDA: Sparse Decoupled Attention for Efficient Long-Context LLM Inference

Abstract SparDA is a decoupled sparse attention architecture that improves long-context LLM inference by reducing KV cache bottlenecks and attention complexity through aForecast projection for lookahead selection. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Sparse attention…

23
Hugging Face Daily Papers research 21d ago

APEX: A Network-Native Time-Series Foundation Model for Forecasting and Anomaly Detection for Wireless Edge Operations

Abstract Network-native transformer model APEX demonstrates superior forecasting performance for wireless network telemetry compared to existing foundation models and traditional methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Generic time-series foundation models transfer…

38
Hugging Face Daily Papers research 21d ago

Towards Diverse Scientific Hypothesis Search with Large Language Models

Abstract Evolutionary framework for hypothesis generation that improves diversity and quality through multi-temperature sampling and information exchange across search levels. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models (LLMs) are on the rise for…

14
Hugging Face Daily Papers research 21d ago

DRIFT: A Residual Flow Adapter for Decoding Continuous Outputs in Vision-Language Models

Abstract DRIFT is a framework that adapts pretrained vision-language models for continuous decoding tasks by combining coarse prediction with iterative refinement through flow matching, improving performance across perception and planning tasks. Generated by…

12
Hugging Face Daily Papers research 21d ago

Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models

Abstract Vision-language models can improve grounding performance under aggressive token reduction by replacing irreversible visual-token pruning with recoverable routing that allows tokens to re-enter the processing pipeline at later stages. Generated by…

16
Hugging Face Daily Papers research 21d ago

Adaptive Multi-Resolution Procedural Knowledge Compression for Large Language Models

Abstract SKIM is an adaptive multi-resolution soft token compression framework that efficiently compresses procedural skills while maintaining task performance and enabling lightweight offline compression for frequently updated community skills. Generated by…

6
Hugging Face Daily Papers research 21d ago

τ-Rec: A Verifiable Benchmark for Agentic Recommender Systems

Abstract A benchmark for agentic recommender systems is introduced that uses verifiable rewards and controlled dialogue constraints to evaluate conversational agent reliability, revealing significant performance gaps among leading models. Generated by…

6
Hugging Face Daily Papers research 21d ago

On Subquadratic Architectures: From Applications to Principles

Abstract xLSTM demonstrates superior performance in sequence modeling tasks compared to Mamba-2 and Gated DeltaNet due to enhanced state tracking and memory dynamics. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Transformers dominate modern sequence modeling, but their quadratic…

4
Hugging Face Daily Papers research 21d ago

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

Abstract ART enables parameter-efficient fine-tuning of frozen multimodal language models by optimizing raw visual input through gradient backpropagation, achieving performance comparable to LoRA while supporting pre-compiled computational graphs. Generated by…

8
Hugging Face Daily Papers research 21d ago

TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

Abstract TRACE is a rollout allocation framework that improves reward contrast in multi-turn agentic reinforcement learning by dynamically distributing resources across tree-structured rollouts based on prefix-level informativeness. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

27
Hugging Face Daily Papers research 21d ago

FlowLet: Conditional 3D Brain MRI Synthesis using Wavelet Flow Matching

Abstract FlowLet is a conditional generative framework that synthesizes age-conditioned 3D MRIs using flow matching in an invertible 3D wavelet domain, improving brain age prediction performance for underrepresented age groups. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Brain…

18
Hugging Face Daily Papers research 21d ago

POISE: Position-Aware Undetectable Skill Injection on LLM Agents

Abstract POISE is a stealthy skill-poisoning attack that embeds malicious triggers within benign-looking instructions, achieving high attack success rates while avoiding detection by LLM scanners that are overly sensitive to privileged tool operations. Generated by…

16
Hugging Face Daily Papers research 21d ago

Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code

Abstract Grammar-constrained decoding techniques used to ensure syntactic validity in code generation can be exploited as an attack surface, leading to the development of a jailbreak method called CodeSpear and a safety alignment approach named CodeShield. Generated by…

37
Hugging Face Daily Papers research 21d ago

Breaking the Bubble: Asynchronous Pipeline Parallel Training with Bounded Weight Inconsistency

Abstract PACI enables efficient asynchronous pipeline training by controlling forward/backward weight inconsistency through local gradient accumulation, achieving higher throughput and faster training time-to-accuracy without sacrificing stability or memory usage. Generated by…

9
Hugging Face Daily Papers research 21d ago

Time-Series Foundation Model Embeddings for Remaining Useful Life Estimation

Abstract A lightweight approach combining a frozen pretrained time-series foundation model with a simple regression head achieves superior RUL prediction performance compared to various baseline methods on industrial sensor data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

15
Hugging Face Daily Papers research 21d ago

Large Language Models Are Overconfident in Their Own Responses

Abstract Instruction tuning degrades calibration in large language models, with chat templates exacerbating overconfidence through ownership bias, which can be mitigated by reframing model responses as user input during confidence assessment. Generated by…

22
Hugging Face Daily Papers research 21d ago

Distilling LLM Feedback for Lean Theorem Proving

Abstract Feedback Distillation improves post-training of reasoning models by using self-distillation with token-level supervision and privileged feedback from language models, offering better diversity and complementary benefits when combined with GRPO. Generated by…

38
Hugging Face Daily Papers research 21d ago

EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning

Abstract EvoTrainer autonomously evolves both language model policies and training harnesses through empirical feedback, demonstrating superior performance in complex reasoning and coding tasks compared to traditional handcrafted approaches. Generated by…

6
Hugging Face Daily Papers research 21d ago

Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning

Abstract A training-free framework for spatial reasoning from egocentric videos that enables revisiting conclusions through synthesized novel-view videos generated from predicted 3D geometry. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Spatial reasoning from egocentric videos…

11

VIA-SD: Verification via Intra-Model Routing for Speculative Decoding

TreeSeeker: Tree-Structured Trial, Error, and Return in Deep Search

Flash-GMM: A Memory-Efficient Kernel for Scalable Soft Clustering

Leveraging Morphology for Historical Script Metrological Analysis

PianoKontext: Expressive Performance Rendering from Deadpan Context

IDEAL: In-DEpth ALignment Makes A Discrete Representation AutoEncoder

High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation

MiniMax Sparse Attention

VideoMDM: Towards 3D Human Motion Generation From 2D Supervision

LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories

Visual Para-Thinker++: A Single-Policy Multi-Agent Framework for Visual Reasoning

SG-OPD: Sign-Gated On-Policy Distillation via Sign-Consistency Gating and Phased Teacher Sampling

Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

MaskAlign: Token-Subset Representation Alignment for Efficient Diffusion Training

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

MuJoCo-Drones-Gym: A GPU-Accelerated Multi-Drone Simulator for Control and Reinforcement Learning

Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning

Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

InterleaveThinker: Reinforcing Agentic Interleaved Generation

FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents

N-GRPO: Embedding-Level Neighbor Mixing for Enhanced Policy Optimization

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold

HarnessBridge: Learnable Bidirectional Controller for LLM Agent Harness

Can Generalist Agents Automate Data Curation?

Building Social World Models with Large Language Models

Which Models Are Our Models Built On? Auditing Invisible Dependencies in Modern LLMs

ReVision: Scaling Computer-Use Agents via Temporal Visual Redundancy Reduction

SparDA: Sparse Decoupled Attention for Efficient Long-Context LLM Inference

APEX: A Network-Native Time-Series Foundation Model for Forecasting and Anomaly Detection for Wireless Edge Operations

Towards Diverse Scientific Hypothesis Search with Large Language Models

DRIFT: A Residual Flow Adapter for Decoding Continuous Outputs in Vision-Language Models

Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models

Adaptive Multi-Resolution Procedural Knowledge Compression for Large Language Models

τ-Rec: A Verifiable Benchmark for Agentic Recommender Systems

On Subquadratic Architectures: From Applications to Principles

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

FlowLet: Conditional 3D Brain MRI Synthesis using Wavelet Flow Matching

POISE: Position-Aware Undetectable Skill Injection on LLM Agents

Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code

Breaking the Bubble: Asynchronous Pipeline Parallel Training with Bounded Weight Inconsistency

Time-Series Foundation Model Embeddings for Remaining Useful Life Estimation

Large Language Models Are Overconfident in Their Own Responses

Distilling LLM Feedback for Lean Theorem Proving

EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning

Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning