Hugging Face Daily Papers
500 articles archived · Visit source ↗ · RSS
-
Hugging Face Daily Papers research 1d ago
Illuminating Unified Multimodal Model for Free-form Interleaved Text-Image Generation
Abstract ILLUME-X is a unified multimodal paradigm that enhances text-image generation through improved data efficiency, stable training processes, and comprehensive evaluation metrics. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The advancement of generative AI models capable…
17 -
Hugging Face Daily Papers research 2d ago
Beyond IID: How General Are Tabular Foundation Models, Really?
Abstract Tabular foundation models show varying performance across different data conditions, with traditional methods still outperforming newer approaches on complex, large-scale datasets. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Foundation models for predictive machine…
15 -
Hugging Face Daily Papers research 2d ago
Agentic Abstention: Do Agents Know When to Stop Instead of Act?
Abstract Agentic abstention involves determining when an AI agent should cease interaction under uncertainty, requiring sequential decision-making across multiple environments and task types. Generated by Qwen/Qwen2.5-Coder-32B-Instruct LLM agents are expected to act over…
13 -
Hugging Face Daily Papers research 2d ago
ZooClaw-FashionSigLIP2: Distilled Fine-tuning for Robust Fashion Retrieval
Abstract A fashion-specialized vision-language model achieves superior retrieval performance through full fine-tuning with knowledge distillation and weight interpolation, outperforming existing methods on a new benchmark while addressing structural biases in existing datasets.…
32 -
Hugging Face Daily Papers research 2d ago
Large-Scale Tunnel Air-Ground Collaboration With FLISP: Fast LiDAR-IMU Synchronized Path Planner
Abstract Hydropower tunnel inspection is critical for infrastructure integrity yet remains inefficient and hazardous using manual methods. We propose FLISP (Fast LiDAR-IMU Synchronized Path Planner), a mapless planning framework for cooperative UGV-UAV inspection. Unlike…
16 -
Hugging Face Daily Papers research 2d ago
Beyond Drug Discovery: The Nanotechnology Molecular Optimization (NMO) Benchmark
Abstract The Nanotechnology Molecular Optimization (NMO) Benchmark introduces physics-based molecular design challenges that require new generative model approaches, moving beyond drug-discovery-focused metrics to enable scientific discovery in nanotechnology. Generated by…
24 -
Hugging Face Daily Papers research 2d ago
TheoremGraph: Bridging Formal and Informal Mathematics
Abstract A unified mathematical dependency graph connects informal and formal mathematics through semantic embedding and automated extraction from arXiv papers and Lean projects. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Mathematical knowledge is organized around statements…
32 -
Hugging Face Daily Papers research 2d ago
Learning Transferable Dynamics Priors from Action to World Modeling
Abstract Action-conditioned world modeling enables transferable dynamics priors for robot learning through pretraining on large-scale manipulation data, supporting both simulator-based policy evaluation and video-action prediction. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We…
27 -
Hugging Face Daily Papers research 2d ago
The Surprising Effectiveness of Video Diffusion Models for Hand Motion Reconstruction
Abstract ViDiHand uses pretrained video diffusion model representations with hand-overlay rendering to reconstruct 4D hand motion directly from video frames without detectors or optimization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct 4D hand motion reconstruction from…
31 -
Hugging Face Daily Papers research 2d ago
Interleaved Speech Language Models Latently Work In Text
Abstract Interleaved speech-text language models exhibit an implicit transcription phase where text tokens become decodable in intermediate layers, followed by text-based prediction before speech domain transformation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Speech language…
16 -
Hugging Face Daily Papers research 2d ago
TUA-Bench: A Benchmark for General-Purpose Terminal-Use Agents
Abstract TUA-Bench presents a comprehensive benchmark for evaluating general-purpose terminal-use agents across diverse digital activities and specialized workflows, revealing significant performance gaps among current frontier agents. Generated by…
4 -
Hugging Face Daily Papers research 2d ago
Video-MME-Logical: A Controlled Diagnostic Benchmark for Video Temporal-Logical Reasoning
Abstract A new benchmark evaluates multimodal large language models' ability to reason over dynamic visual evidence through controlled temporal-logical operations rather than simple object recognition. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent interest in multimodal…
25 -
Hugging Face Daily Papers research 2d ago
Unlocking the Visual Record of Materials Science: A Large-Scale Multimodal Dataset from Scientific Literature
Abstract A novel pipeline called MatMMExtract is introduced that processes compound scientific figures into individual panels and generates structured annotations using large language models, creating a comprehensive dataset for vision-language learning in materials science.…
16 -
Hugging Face Daily Papers research 2d ago
Trimming the Long-Tail of Visual World Modeling Evaluation
Abstract Current visual world models demonstrate limited generalization beyond common physical interactions, struggling with rare and irregular scenarios despite achieving realism on standard benchmarks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Physical interactions follow a…
28 -
Hugging Face Daily Papers research 2d ago
LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing
Abstract A novel streaming video editing framework enables causal, frame-by-frame editing with stable long-horizon preservation and real-time responsiveness through a three-stage distillation pipeline and AR-oriented mask cache. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
24 -
Hugging Face Daily Papers research 2d ago
Geometric Stability of Neural Population Codes: Regional Variation, Behavioral Relevance, and Circuit Dependence
Abstract Geometric stability measures the consistency of pairwise stimulus distances across trials, revealing a distinct aspect of neural representation that differs from temporal stability and decoding accuracy. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Current models of…
27 -
Hugging Face Daily Papers research 2d ago
Walking in the Implicit: Interactive World Exploration via Neural Scene Representation
Abstract NeuWorld enables efficient interactive video generation by representing scenes as compact neural implicit states and using a transformer VAE with diffusion transformer for trajectory-conditioned rendering. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Interactive video…
25 -
Hugging Face Daily Papers research 2d ago
SafePyramid: A Hierarchical Benchmark for In-context Policy Guardrailing
Abstract SafePyramid benchmark evaluates guardrail systems' ability to identify safety violations through in-context policy specification across multiple domains and complexity levels. Generated by Qwen/Qwen2.5-Coder-32B-Instruct In real-world applications, guardrails are often…
5 -
Hugging Face Daily Papers research 2d ago
PoseShield: Neural Collision Fields for Human Self-Collision Resolution
Abstract PoseShield addresses self-collision issues in SMPL-based human pose estimation by applying neural collision constraints in pose space through constrained optimization and Eikonal regularization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Self-collision remains a…
15 -
Hugging Face Daily Papers research 2d ago
Orca: The World is in Your Mind
Abstract Orca establishes a unified world latent space through next-state-prediction modeling using multimodal data and demonstrates superior performance in downstream tasks compared to specialized baselines. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We introduce Orca, an…
38 -
Hugging Face Daily Papers research 2d ago
ReFreeKV: Towards Threshold-Free KV Cache Compression
Abstract ReFreeKV addresses the limitations of threshold-dependent KV cache pruning by introducing a threshold-free approach that adaptively allocates compression budgets while maintaining full-cache performance across diverse datasets and model sizes. Generated by…
31 -
Hugging Face Daily Papers research 2d ago
Monte Carlo Energy Aggregation for Mobile 3D Gaussian Splatting
Abstract Flux-GS enables real-time high-fidelity 3D Gaussian Splatting on mobile platforms through efficient lighting representation, attribute-conditioned enhancement, and multi-view densification strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent advances in 3D…
10 -
Hugging Face Daily Papers research 2d ago
Nemotron-Labs-Diffusion-Image: Advancing Masked Discrete Diffusion for High-Resolution Image Synthesis
Abstract A masked discrete diffusion model for text-to-image synthesis that addresses limitations in token refinement and training efficiency through novel mechanisms and optimizations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We propose Nemotron-Labs-Diffusion-Image, a…
25 -
Hugging Face Daily Papers research 2d ago
TACO: Tool-Augmented Credit Optimization for Agentic Tool Use
Abstract Tool-Augmented Credit Optimization (TACO) improves multimodal agent performance by distinguishing useful, redundant, or misleading code operations through dual advantage channels: Differential Answer-Probe Reward for individual tool contribution and Outcome-Gated…
38 -
Hugging Face Daily Papers research 2d ago
ReasoningLens: Hierarchical Visualization and Diagnostic Auditing for Large Reasoning Models
Abstract ReasoningLens is an open-source framework that provides hierarchical visualization and diagnostic auditing for complex reasoning chains in large reasoning models, enabling structured analysis and error detection through interactive hierarchies and automated auditing.…
21 -
Hugging Face Daily Papers research 2d ago
Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent
Abstract Agents-A1, a 35B Mixture-of-Experts Agentic Model, achieves trillion-parameter-level performance through long-horizon trajectory scaling and heterogeneous agent ability scaling via a three-stage training approach involving supervised fine-tuning, domain-level teacher…
28 -
Hugging Face Daily Papers research 2d ago
PolicyGuard: A Dialogue-Grounded Sub-Agent Verifier for Policy Adherence in LLM Agents
Abstract POLICYGUARD is a sub-agent verifier that enhances LLM agent policy adherence by providing contextual reasoning and conversation-specific feedback across multi-turn interactions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct LLM agents handle user requests on behalf of…
11 -
Hugging Face Daily Papers research 2d ago
Focusing on What Matters: Saliency-Harnessing Accurate Routing for Diffusion MoE
Abstract SharpMoE addresses routing inefficiencies in diffusion models by using clean latent features to guide salient token identification and employs trajectory routing loss for precise compute allocation during multi-step denoising. Generated by…
29 -
Hugging Face Daily Papers research 2d ago
How Good Can Linear Models Be for Time-Series Forecasting?
Abstract Research demonstrates that preprocessing optimizations, particularly in context length, normalization, and regularization, can significantly improve time-series forecasting accuracy more effectively than scaling model architectures. Generated by…
32 -
Hugging Face Daily Papers research 2d ago
GUICrafter: Weakly-Supervised GUI Agent Leveraging Massive Unannotated Screenshots
Abstract GUICrafter addresses GUI agent data challenges through a weakly-supervised approach using unannotated screenshots and a two-stage curriculum learning framework for visual grounding and reinforcement learning calibration. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
10 -
Hugging Face Daily Papers research 2d ago
Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction
Abstract Epi2Diff framework transforms LRM reasoning traces into cognitive episodes to predict human item difficulty more accurately than existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Predicting human item difficulty is central to educational assessment, where…
8 -
Hugging Face Daily Papers research 2d ago
Bridging VideoQA and Video-Guided Agentic Tasks via Generalized Keyframe Extraction
Abstract A new benchmark evaluates multimodal large language models' ability to understand video content and perform GUI tasks, while a novel keyframe extraction method improves performance on both video question answering and video-guided agentic tasks. Generated by…
28 -
Hugging Face Daily Papers research 2d ago
MIMFlow: Integrating Masked Image Modeling with Normalizing Flows for End-to-End Image Generation
Abstract MIMFlow combines Normalizing Flows with Masked Image Modeling to improve generative modeling by decoupling semantic representation from pixel-level details, achieving better performance with fewer tokens. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Normalizing Flows…
37 -
Hugging Face Daily Papers research 2d ago
AsyncOPD: How Stale Can On-Policy Distillation Be?
Abstract Asynchronous on-policy distillation addresses training bottlenecks in large language model post-training by decoupling rollout generation from learner updates, though it introduces challenges with stale policy data that require specialized solutions. Generated by…
4 -
Hugging Face Daily Papers research 4d ago
COrigami: An AI Pipeline for Co-Designing Flat-Foldable Visually Recognisable Origami
Abstract A computational origami system generates crease patterns from natural language using AI-driven optimization and aesthetic evaluation, enabling human-AI collaboration in mathematically constrained design. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While generative AI…
11 -
Hugging Face Daily Papers research 4d ago
Fast LeWorldModel
Abstract Fast-LeWM accelerates visual planning by replacing autoregressive rollout with parallel action-prefix prediction, reducing computational costs and latency accumulation during long-horizon predictions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Joint-Embedding…
20 -
Hugging Face Daily Papers research 5d ago
ABACUS: Adapting Unified Foundation Model for Bridging Image Count Understanding and Generation
Abstract ABACUS is a unified vision-language model that performs object counting and related tasks through innovative spatial grounding, boundary-aware counting policies, and self-critical learning strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct ABACUS is a unified…
16 -
Hugging Face Daily Papers research 5d ago
Neglected Free Lunch from Post-training: Progress Advantage for LLM Agents
Abstract Reinforcement learning post-training enables effective step-level scoring for language models without requiring dedicated reward model training by deriving an implicit advantage function called progress advantage. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Process…
6 -
Hugging Face Daily Papers research 5d ago
Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation
Abstract A unified agentic framework called Qwen-Image-Agent is proposed to address the context gap in text-to-image generation by progressively constructing complete generation context through planning, reasoning, searching, and memory mechanisms. Generated by…
22 -
Hugging Face Daily Papers research 5d ago
Information-Aware KV Cache Compression for Long Reasoning
Abstract InfoKV is an entropy-aware KV cache compression framework that enhances long-context reasoning in LLMs by incorporating information-theoretic signals alongside attention weights. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reasoning capability has advanced rapidly in…
10 -
Hugging Face Daily Papers research 5d ago
EO-WM: A Physically Informed World Model for Probabilistic Earth Observation Forecasting
Abstract EO-WM is a video diffusion transformer for multispectral Earth Observation forecasting that incorporates physically informed conditioning frameworks to better capture weather-driven uncertainties in land-surface dynamics. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
10 -
Hugging Face Daily Papers research 5d ago
LISA: Likelihood Score Alignment for Visual-condition Controllable Generation
Abstract Score-based generative modeling reveals that side networks contribute likelihood scores to conditional control, leading to improved training efficiency through likelihood score alignment regularization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The prevalent…
36 -
Hugging Face Daily Papers research 5d ago
Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments
Abstract A web-based benchmark evaluates agent generalization across challenging scenarios, revealing significant gaps between current agentic systems and human performance in temporal perception, graphical understanding, and 3D reasoning. Generated by…
10 -
-
Hugging Face Daily Papers research 6d ago
PhysiFormer: Learning to Simulate Mechanics in World Space
Abstract PhysiFormer uses coordinate-space diffusion to generate physically-plausible 3D object motions without explicit inductive biases, enabling efficient multi-object reasoning and generalization to complex materials and geometries. Generated by…
30 -
Hugging Face Daily Papers research 6d ago
CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies
Abstract CoffeeBench evaluates LLM agents in a multi-agent economic simulation where firms interact over 90 days to maximize profits, revealing differences in communication patterns and performance among various models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As LLM agents…
4 -
Hugging Face Daily Papers research 6d ago
Discretizing Reward Models
Abstract Reward models in reinforcement learning suffer from oversensitivity issues where they assign different scores to equally good responses, leading to poor policy learning, but this can be mitigated through discretization techniques that maintain discriminative ability…
16 -
Hugging Face Daily Papers research 6d ago
JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting
Abstract JetSpec is a speculative decoding framework that combines efficient forward drafting with causal conditioning to improve LLM inference speed and acceptance rates across various benchmarks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Speculative decoding (SD)…
17 -
Hugging Face Daily Papers research 6d ago
How Post-Training Shapes Biological Reasoning Models
Abstract Post-training stages in biological reasoning models differently affect generalization, with continued pre-training aligning models with biological language, supervised fine-tuning improving in-domain performance but reducing out-of-domain generalization, and…
8 -
Hugging Face Daily Papers research 6d ago
Hallucination in World Models is Predictable and Preventable
Abstract World models exhibit hallucinations in low-data regions of state-action space, which can be detected and mitigated using data-centric signals and coverage-aware sampling techniques. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Modern generative world models render…
25