Hugging Face Daily Papers
500 articles archived · Visit source ↗ · RSS
-
Hugging Face Daily Papers research 20d ago
VIA-SD: Verification via Intra-Model Routing for Speculative Decoding
Abstract VIA-SD introduces a multi-tier speculative decoding framework that uses intra-model routing to reduce verification costs by employing slim submodels for medium-confidence token validation, achieving significant speedups over traditional approaches. Generated by…
32 -
Hugging Face Daily Papers research 20d ago
TreeSeeker: Tree-Structured Trial, Error, and Return in Deep Search
Abstract TreeSeeker is an inference-time framework that uses tree-structured search with branch-and-return control to manage exploration and exploitation in deep search tasks, improving performance through systematic trial-and-error decision making. Generated by…
23 -
Hugging Face Daily Papers research 20d ago
Flash-GMM: A Memory-Efficient Kernel for Scalable Soft Clustering
Abstract Flash-GMM introduces an efficient fused Triton kernel for Gaussian Mixture Models that achieves significant speedup and enables processing much larger datasets on a single GPU. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present Flash-GMM, a fused Triton kernel for…
18 -
Hugging Face Daily Papers research 20d ago
Leveraging Morphology for Historical Script Metrological Analysis
Abstract A transformer-based architecture with prototype learning enables scalable paleographic measurements from historical documents using only line-level transcriptions, demonstrating its effectiveness on a 160-page codex with minimal training data requirements. Generated by…
37 -
Hugging Face Daily Papers research 20d ago
PianoKontext: Expressive Performance Rendering from Deadpan Context
Abstract PianoKontext generates variable-length piano performances by aligning MIDI scores with audio in latent space using DTW and DiT blocks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Expressive performance rendering (EPR) aims to generate realistic performances constrained…
12 -
Hugging Face Daily Papers research 20d ago
IDEAL: In-DEpth ALignment Makes A Discrete Representation AutoEncoder
Abstract Representation autoencoders using deep learning frameworks can improve image reconstruction quality by combining shallow and deep visual feature representations for better semantic richness and visual fidelity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Built on…
31 -
Hugging Face Daily Papers research 20d ago
High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation
Abstract A 2-step image generation model is developed through distillation from an 8-step teacher using distribution-aligned adversarial learning, step-decoupled parameterization, and end-to-end training with iterative regularization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
33 -
Hugging Face Daily Papers research 20d ago
MiniMax Sparse Attention
Abstract MiniMax Sparse Attention enables efficient processing of ultra-long contexts in large language models through blockwise sparsity and optimized GPU execution, achieving significant speedups while maintaining performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
20 -
Hugging Face Daily Papers research 20d ago
VideoMDM: Towards 3D Human Motion Generation From 2D Supervision
Abstract VideoMDM trains 3D human motion priors from 2D poses using a diffusion framework with 2D reprojection loss and 3D motion regularizers, achieving near-3D supervised performance without requiring 3D ground truth. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We introduce…
5 -
Hugging Face Daily Papers research 20d ago
LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories
Abstract LabVLA, a vision-language-action model trained with a two-stage approach combining action token pretraining and flow matching, demonstrates superior performance on laboratory automation tasks through simulated data generation and robot-specific learning. Generated by…
18 -
Hugging Face Daily Papers research 20d ago
Visual Para-Thinker++: A Single-Policy Multi-Agent Framework for Visual Reasoning
Abstract A multi-agent framework with shared MLLM policy and role-specific training methods improves visual reasoning by reducing hallucinations and enabling efficient parallel processing. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Visual reasoning requires integrating…
6 -
Hugging Face Daily Papers research 20d ago
SG-OPD: Sign-Gated On-Policy Distillation via Sign-Consistency Gating and Phased Teacher Sampling
Abstract Sign-Gated On-Policy Distillation improves upon standard on-policy distillation by incorporating a binary verifier to filter teacher signals, resulting in better performance on mathematical reasoning tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct On-policy…
38 -
Hugging Face Daily Papers research 20d ago
Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?
Abstract Robust-U1 enhances multimodal large language models' robustness against visual corruptions through self-recovery capabilities that improve both visual quality and reasoning performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multimodal Large Language Models…
4 -
Hugging Face Daily Papers research 20d ago
EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge
Abstract EvoBrowseComp is an evolving benchmark with 800 contamination-free questions synthesized through a three-agent framework that ensures temporal freshness and prevents parametric memorization in search agent evaluation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Search…
26 -
Hugging Face Daily Papers research 20d ago
MaskAlign: Token-Subset Representation Alignment for Efficient Diffusion Training
Abstract Token-subset representation alignment method called MaskAlign improves diffusion transformer training by reducing reliance on complete token sets and maintaining stable alignment behavior under perturbations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Representation…
12 -
Hugging Face Daily Papers research 20d ago
EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments
Abstract EvoArena benchmark and EvoMem memory paradigm address the challenge of dynamic environments in LLM agents by modeling progressive updates and structured memory evolution, showing improved performance on evolving tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large…
5 -
Hugging Face Daily Papers research 20d ago
MuJoCo-Drones-Gym: A GPU-Accelerated Multi-Drone Simulator for Control and Reinforcement Learning
Abstract A Gymnasium-compatible multi-drone simulation environment built on MuJoCo physics engine that supports flexible physics models, action interfaces, and observation spaces for reinforcement learning applications. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Robotic…
35 -
Hugging Face Daily Papers research 20d ago
Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning
Abstract A switchable latent reasoning framework uses explicit boundary tokens to enable trainable and interpretable latent reasoning through recurrent hidden states. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Latent chain-of-thought compresses reasoning by replacing visible…
24 -
Hugging Face Daily Papers research 20d ago
Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents
Abstract Evoflux enables compact language models to execute tool workflows more reliably by using evolutionary search to repair failed plans during inference, significantly improving execution feasibility compared to traditional fine-tuning methods. Generated by…
20 -
Hugging Face Daily Papers research 20d ago
WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces
Abstract WeaveBench presents a comprehensive benchmark for evaluating computer-use agents across multiple interfaces, revealing significant challenges in long-horizon task orchestration and highlighting the limitations of traditional performance assessment methods. Generated by…
38 -
Hugging Face Daily Papers research 20d ago
EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery
Abstract Environment engineering enhances autonomous scientific discovery by designing structured agent environments that optimize behaviors like exploration and collaboration while mitigating issues such as reward hacking and human oversight friction, as demonstrated by the…
35 -
Hugging Face Daily Papers research 20d ago
InterleaveThinker: Reinforcing Agentic Interleaved Generation
Abstract InterleaveThinker enables interleaved generation capabilities for image generators through a multi-agent pipeline with planner and critic agents, achieving performance comparable to state-of-the-art models while enhancing reasoning benchmarks. Generated by…
36 -
Hugging Face Daily Papers research 20d ago
FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents
Abstract A framework for creating shortcut-resistant training data for deep search agents by identifying and mitigating four shortcut risks in data synthesis processes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Training deep search agents requires verifiable questions whose…
11 -
Hugging Face Daily Papers research 20d ago
N-GRPO: Embedding-Level Neighbor Mixing for Enhanced Policy Optimization
Abstract N-GRPO, a novel exploration strategy within GRPO framework, enhances mathematical reasoning in large language models through semantic neighbor mixing that maintains semantic consistency while injecting diversity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The success…
27 -
Hugging Face Daily Papers research 20d ago
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning
Abstract SpatialClaw is a training-free framework that uses code as an action interface to enable flexible, stateful spatial reasoning in vision-language models, achieving superior performance across diverse 3D/4D spatial reasoning tasks. Generated by…
36 -
Hugging Face Daily Papers research 20d ago
MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold
Abstract MoVerse generates real-time interactive video from single images by creating 360° panoramas and 3D Gaussian scaffolds, enabling efficient rendering through diffusion-based techniques. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present MoVerse, a real-time video…
22 -
Hugging Face Daily Papers research 20d ago
HarnessBridge: Learnable Bidirectional Controller for LLM Agent Harness
Abstract Learnable harness controller called HarnessBridge is introduced to parameterize agent-environment interfaces through bidirectional projections, achieving performance comparable to specialized harnesses with reduced computational overhead. Generated by…
21 -
Hugging Face Daily Papers research 20d ago
Can Generalist Agents Automate Data Curation?
Abstract Automated data curation using generalist coding agents shows promise but requires structured scaffolding to achieve superior performance compared to traditional methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Curating training data is among the most consequential…
33 -
Hugging Face Daily Papers research 21d ago
Building Social World Models with Large Language Models
Abstract Social World Model framework captures evolution of social beliefs in response to events through temporal pattern mining and evidence lower bound optimization without explicit human annotations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Understanding and predicting…
33 -
Hugging Face Daily Papers research 21d ago
Which Models Are Our Models Built On? Auditing Invisible Dependencies in Modern LLMs
Abstract ModSleuth is an agentic system that recursively reconstructs large-scale dependency graphs for LLM development by analyzing public artifacts and resolving inconsistencies in documentation and artifact identities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Modern LLM…
6 -
Hugging Face Daily Papers research 21d ago
ReVision: Scaling Computer-Use Agents via Temporal Visual Redundancy Reduction
Abstract ReVision improves computer-use agent efficiency by removing redundant visual patches from consecutive screenshots while preserving spatial structure, reducing token usage by 46% and improving success rates. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Computer-use…
10 -
Hugging Face Daily Papers research 21d ago
SparDA: Sparse Decoupled Attention for Efficient Long-Context LLM Inference
Abstract SparDA is a decoupled sparse attention architecture that improves long-context LLM inference by reducing KV cache bottlenecks and attention complexity through aForecast projection for lookahead selection. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Sparse attention…
23 -
Hugging Face Daily Papers research 21d ago
APEX: A Network-Native Time-Series Foundation Model for Forecasting and Anomaly Detection for Wireless Edge Operations
Abstract Network-native transformer model APEX demonstrates superior forecasting performance for wireless network telemetry compared to existing foundation models and traditional methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Generic time-series foundation models transfer…
38 -
Hugging Face Daily Papers research 21d ago
Towards Diverse Scientific Hypothesis Search with Large Language Models
Abstract Evolutionary framework for hypothesis generation that improves diversity and quality through multi-temperature sampling and information exchange across search levels. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models (LLMs) are on the rise for…
14 -
Hugging Face Daily Papers research 21d ago
DRIFT: A Residual Flow Adapter for Decoding Continuous Outputs in Vision-Language Models
Abstract DRIFT is a framework that adapts pretrained vision-language models for continuous decoding tasks by combining coarse prediction with iterative refinement through flow matching, improving performance across perception and planning tasks. Generated by…
12 -
Hugging Face Daily Papers research 21d ago
Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models
Abstract Vision-language models can improve grounding performance under aggressive token reduction by replacing irreversible visual-token pruning with recoverable routing that allows tokens to re-enter the processing pipeline at later stages. Generated by…
16 -
Hugging Face Daily Papers research 21d ago
Adaptive Multi-Resolution Procedural Knowledge Compression for Large Language Models
Abstract SKIM is an adaptive multi-resolution soft token compression framework that efficiently compresses procedural skills while maintaining task performance and enabling lightweight offline compression for frequently updated community skills. Generated by…
6 -
Hugging Face Daily Papers research 21d ago
τ-Rec: A Verifiable Benchmark for Agentic Recommender Systems
Abstract A benchmark for agentic recommender systems is introduced that uses verifiable rewards and controlled dialogue constraints to evaluate conversational agent reliability, revealing significant performance gaps among leading models. Generated by…
6 -
Hugging Face Daily Papers research 21d ago
On Subquadratic Architectures: From Applications to Principles
Abstract xLSTM demonstrates superior performance in sequence modeling tasks compared to Mamba-2 and Gated DeltaNet due to enhanced state tracking and memory dynamics. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Transformers dominate modern sequence modeling, but their quadratic…
4 -
Hugging Face Daily Papers research 21d ago
Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training
Abstract ART enables parameter-efficient fine-tuning of frozen multimodal language models by optimizing raw visual input through gradient backpropagation, achieving performance comparable to LoRA while supporting pre-compiled computational graphs. Generated by…
8 -
Hugging Face Daily Papers research 21d ago
TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning
Abstract TRACE is a rollout allocation framework that improves reward contrast in multi-turn agentic reinforcement learning by dynamically distributing resources across tree-structured rollouts based on prefix-level informativeness. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
27 -
Hugging Face Daily Papers research 21d ago
FlowLet: Conditional 3D Brain MRI Synthesis using Wavelet Flow Matching
Abstract FlowLet is a conditional generative framework that synthesizes age-conditioned 3D MRIs using flow matching in an invertible 3D wavelet domain, improving brain age prediction performance for underrepresented age groups. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Brain…
18 -
Hugging Face Daily Papers research 21d ago
POISE: Position-Aware Undetectable Skill Injection on LLM Agents
Abstract POISE is a stealthy skill-poisoning attack that embeds malicious triggers within benign-looking instructions, achieving high attack success rates while avoiding detection by LLM scanners that are overly sensitive to privileged tool operations. Generated by…
16 -
Hugging Face Daily Papers research 21d ago
Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code
Abstract Grammar-constrained decoding techniques used to ensure syntactic validity in code generation can be exploited as an attack surface, leading to the development of a jailbreak method called CodeSpear and a safety alignment approach named CodeShield. Generated by…
37 -
Hugging Face Daily Papers research 21d ago
Breaking the Bubble: Asynchronous Pipeline Parallel Training with Bounded Weight Inconsistency
Abstract PACI enables efficient asynchronous pipeline training by controlling forward/backward weight inconsistency through local gradient accumulation, achieving higher throughput and faster training time-to-accuracy without sacrificing stability or memory usage. Generated by…
9 -
Hugging Face Daily Papers research 21d ago
Time-Series Foundation Model Embeddings for Remaining Useful Life Estimation
Abstract A lightweight approach combining a frozen pretrained time-series foundation model with a simple regression head achieves superior RUL prediction performance compared to various baseline methods on industrial sensor data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
15 -
Hugging Face Daily Papers research 21d ago
Large Language Models Are Overconfident in Their Own Responses
Abstract Instruction tuning degrades calibration in large language models, with chat templates exacerbating overconfidence through ownership bias, which can be mitigated by reframing model responses as user input during confidence assessment. Generated by…
22 -
Hugging Face Daily Papers research 21d ago
Distilling LLM Feedback for Lean Theorem Proving
Abstract Feedback Distillation improves post-training of reasoning models by using self-distillation with token-level supervision and privileged feedback from language models, offering better diversity and complementary benefits when combined with GRPO. Generated by…
38 -
Hugging Face Daily Papers research 21d ago
EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning
Abstract EvoTrainer autonomously evolves both language model policies and training harnesses through empirical feedback, demonstrating superior performance in complex reasoning and coding tasks compared to traditional handcrafted approaches. Generated by…
6 -
Hugging Face Daily Papers research 21d ago
Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning
Abstract A training-free framework for spatial reasoning from egocentric videos that enables revisiting conclusions through synthesized novel-view videos generated from predicted 3D geometry. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Spatial reasoning from egocentric videos…
11