Hugging Face Daily Papers

500 articles archived · Visit source ↗ · RSS

Hugging Face Daily Papers research 1d ago

Illuminating Unified Multimodal Model for Free-form Interleaved Text-Image Generation

Abstract ILLUME-X is a unified multimodal paradigm that enhances text-image generation through improved data efficiency, stable training processes, and comprehensive evaluation metrics. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The advancement of generative AI models capable…

17
Hugging Face Daily Papers research 2d ago

Beyond IID: How General Are Tabular Foundation Models, Really?

Abstract Tabular foundation models show varying performance across different data conditions, with traditional methods still outperforming newer approaches on complex, large-scale datasets. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Foundation models for predictive machine…

15
Hugging Face Daily Papers research 2d ago

Agentic Abstention: Do Agents Know When to Stop Instead of Act?

Abstract Agentic abstention involves determining when an AI agent should cease interaction under uncertainty, requiring sequential decision-making across multiple environments and task types. Generated by Qwen/Qwen2.5-Coder-32B-Instruct LLM agents are expected to act over…

13
Hugging Face Daily Papers research 2d ago

ZooClaw-FashionSigLIP2: Distilled Fine-tuning for Robust Fashion Retrieval

Abstract A fashion-specialized vision-language model achieves superior retrieval performance through full fine-tuning with knowledge distillation and weight interpolation, outperforming existing methods on a new benchmark while addressing structural biases in existing datasets.…

32
Hugging Face Daily Papers research 2d ago

Large-Scale Tunnel Air-Ground Collaboration With FLISP: Fast LiDAR-IMU Synchronized Path Planner

Abstract Hydropower tunnel inspection is critical for infrastructure integrity yet remains inefficient and hazardous using manual methods. We propose FLISP (Fast LiDAR-IMU Synchronized Path Planner), a mapless planning framework for cooperative UGV-UAV inspection. Unlike…

16
Hugging Face Daily Papers research 2d ago

Beyond Drug Discovery: The Nanotechnology Molecular Optimization (NMO) Benchmark

Abstract The Nanotechnology Molecular Optimization (NMO) Benchmark introduces physics-based molecular design challenges that require new generative model approaches, moving beyond drug-discovery-focused metrics to enable scientific discovery in nanotechnology. Generated by…

24
Hugging Face Daily Papers research 2d ago

TheoremGraph: Bridging Formal and Informal Mathematics

Abstract A unified mathematical dependency graph connects informal and formal mathematics through semantic embedding and automated extraction from arXiv papers and Lean projects. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Mathematical knowledge is organized around statements…

32
Hugging Face Daily Papers research 2d ago

Learning Transferable Dynamics Priors from Action to World Modeling

Abstract Action-conditioned world modeling enables transferable dynamics priors for robot learning through pretraining on large-scale manipulation data, supporting both simulator-based policy evaluation and video-action prediction. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We…

27
Hugging Face Daily Papers research 2d ago

The Surprising Effectiveness of Video Diffusion Models for Hand Motion Reconstruction

Abstract ViDiHand uses pretrained video diffusion model representations with hand-overlay rendering to reconstruct 4D hand motion directly from video frames without detectors or optimization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct 4D hand motion reconstruction from…

31
Hugging Face Daily Papers research 2d ago

Interleaved Speech Language Models Latently Work In Text

Abstract Interleaved speech-text language models exhibit an implicit transcription phase where text tokens become decodable in intermediate layers, followed by text-based prediction before speech domain transformation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Speech language…

16
Hugging Face Daily Papers research 2d ago

TUA-Bench: A Benchmark for General-Purpose Terminal-Use Agents

Abstract TUA-Bench presents a comprehensive benchmark for evaluating general-purpose terminal-use agents across diverse digital activities and specialized workflows, revealing significant performance gaps among current frontier agents. Generated by…

4
Hugging Face Daily Papers research 2d ago

Video-MME-Logical: A Controlled Diagnostic Benchmark for Video Temporal-Logical Reasoning

Abstract A new benchmark evaluates multimodal large language models' ability to reason over dynamic visual evidence through controlled temporal-logical operations rather than simple object recognition. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent interest in multimodal…

25
Hugging Face Daily Papers research 2d ago

Unlocking the Visual Record of Materials Science: A Large-Scale Multimodal Dataset from Scientific Literature

Abstract A novel pipeline called MatMMExtract is introduced that processes compound scientific figures into individual panels and generates structured annotations using large language models, creating a comprehensive dataset for vision-language learning in materials science.…

16
Hugging Face Daily Papers research 2d ago

Trimming the Long-Tail of Visual World Modeling Evaluation

Abstract Current visual world models demonstrate limited generalization beyond common physical interactions, struggling with rare and irregular scenarios despite achieving realism on standard benchmarks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Physical interactions follow a…

28
Hugging Face Daily Papers research 2d ago

LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing

Abstract A novel streaming video editing framework enables causal, frame-by-frame editing with stable long-horizon preservation and real-time responsiveness through a three-stage distillation pipeline and AR-oriented mask cache. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

24
Hugging Face Daily Papers research 2d ago

Geometric Stability of Neural Population Codes: Regional Variation, Behavioral Relevance, and Circuit Dependence

Abstract Geometric stability measures the consistency of pairwise stimulus distances across trials, revealing a distinct aspect of neural representation that differs from temporal stability and decoding accuracy. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Current models of…

27
Hugging Face Daily Papers research 2d ago

Walking in the Implicit: Interactive World Exploration via Neural Scene Representation

Abstract NeuWorld enables efficient interactive video generation by representing scenes as compact neural implicit states and using a transformer VAE with diffusion transformer for trajectory-conditioned rendering. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Interactive video…

25
Hugging Face Daily Papers research 2d ago

SafePyramid: A Hierarchical Benchmark for In-context Policy Guardrailing

Abstract SafePyramid benchmark evaluates guardrail systems' ability to identify safety violations through in-context policy specification across multiple domains and complexity levels. Generated by Qwen/Qwen2.5-Coder-32B-Instruct In real-world applications, guardrails are often…

5
Hugging Face Daily Papers research 2d ago

PoseShield: Neural Collision Fields for Human Self-Collision Resolution

Abstract PoseShield addresses self-collision issues in SMPL-based human pose estimation by applying neural collision constraints in pose space through constrained optimization and Eikonal regularization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Self-collision remains a…

15
Hugging Face Daily Papers research 2d ago

Orca: The World is in Your Mind

Abstract Orca establishes a unified world latent space through next-state-prediction modeling using multimodal data and demonstrates superior performance in downstream tasks compared to specialized baselines. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We introduce Orca, an…

38
Hugging Face Daily Papers research 2d ago

ReFreeKV: Towards Threshold-Free KV Cache Compression

Abstract ReFreeKV addresses the limitations of threshold-dependent KV cache pruning by introducing a threshold-free approach that adaptively allocates compression budgets while maintaining full-cache performance across diverse datasets and model sizes. Generated by…

31
Hugging Face Daily Papers research 2d ago

Monte Carlo Energy Aggregation for Mobile 3D Gaussian Splatting

Abstract Flux-GS enables real-time high-fidelity 3D Gaussian Splatting on mobile platforms through efficient lighting representation, attribute-conditioned enhancement, and multi-view densification strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent advances in 3D…

10
Hugging Face Daily Papers research 2d ago

Nemotron-Labs-Diffusion-Image: Advancing Masked Discrete Diffusion for High-Resolution Image Synthesis

Abstract A masked discrete diffusion model for text-to-image synthesis that addresses limitations in token refinement and training efficiency through novel mechanisms and optimizations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We propose Nemotron-Labs-Diffusion-Image, a…

25
Hugging Face Daily Papers research 2d ago

TACO: Tool-Augmented Credit Optimization for Agentic Tool Use

Abstract Tool-Augmented Credit Optimization (TACO) improves multimodal agent performance by distinguishing useful, redundant, or misleading code operations through dual advantage channels: Differential Answer-Probe Reward for individual tool contribution and Outcome-Gated…

38
Hugging Face Daily Papers research 2d ago

ReasoningLens: Hierarchical Visualization and Diagnostic Auditing for Large Reasoning Models

Abstract ReasoningLens is an open-source framework that provides hierarchical visualization and diagnostic auditing for complex reasoning chains in large reasoning models, enabling structured analysis and error detection through interactive hierarchies and automated auditing.…

21
Hugging Face Daily Papers research 2d ago

Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent

Abstract Agents-A1, a 35B Mixture-of-Experts Agentic Model, achieves trillion-parameter-level performance through long-horizon trajectory scaling and heterogeneous agent ability scaling via a three-stage training approach involving supervised fine-tuning, domain-level teacher…

28
Hugging Face Daily Papers research 2d ago

PolicyGuard: A Dialogue-Grounded Sub-Agent Verifier for Policy Adherence in LLM Agents

Abstract POLICYGUARD is a sub-agent verifier that enhances LLM agent policy adherence by providing contextual reasoning and conversation-specific feedback across multi-turn interactions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct LLM agents handle user requests on behalf of…

11
Hugging Face Daily Papers research 2d ago

Focusing on What Matters: Saliency-Harnessing Accurate Routing for Diffusion MoE

Abstract SharpMoE addresses routing inefficiencies in diffusion models by using clean latent features to guide salient token identification and employs trajectory routing loss for precise compute allocation during multi-step denoising. Generated by…

29
Hugging Face Daily Papers research 2d ago

How Good Can Linear Models Be for Time-Series Forecasting?

Abstract Research demonstrates that preprocessing optimizations, particularly in context length, normalization, and regularization, can significantly improve time-series forecasting accuracy more effectively than scaling model architectures. Generated by…

32
Hugging Face Daily Papers research 2d ago

GUICrafter: Weakly-Supervised GUI Agent Leveraging Massive Unannotated Screenshots

Abstract GUICrafter addresses GUI agent data challenges through a weakly-supervised approach using unannotated screenshots and a two-stage curriculum learning framework for visual grounding and reinforcement learning calibration. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

10
Hugging Face Daily Papers research 2d ago

Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction

Abstract Epi2Diff framework transforms LRM reasoning traces into cognitive episodes to predict human item difficulty more accurately than existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Predicting human item difficulty is central to educational assessment, where…

8
Hugging Face Daily Papers research 2d ago

Bridging VideoQA and Video-Guided Agentic Tasks via Generalized Keyframe Extraction

Abstract A new benchmark evaluates multimodal large language models' ability to understand video content and perform GUI tasks, while a novel keyframe extraction method improves performance on both video question answering and video-guided agentic tasks. Generated by…

28
Hugging Face Daily Papers research 2d ago

MIMFlow: Integrating Masked Image Modeling with Normalizing Flows for End-to-End Image Generation

Abstract MIMFlow combines Normalizing Flows with Masked Image Modeling to improve generative modeling by decoupling semantic representation from pixel-level details, achieving better performance with fewer tokens. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Normalizing Flows…

37
Hugging Face Daily Papers research 2d ago

AsyncOPD: How Stale Can On-Policy Distillation Be?

Abstract Asynchronous on-policy distillation addresses training bottlenecks in large language model post-training by decoupling rollout generation from learner updates, though it introduces challenges with stale policy data that require specialized solutions. Generated by…

4
Hugging Face Daily Papers research 4d ago

COrigami: An AI Pipeline for Co-Designing Flat-Foldable Visually Recognisable Origami

Abstract A computational origami system generates crease patterns from natural language using AI-driven optimization and aesthetic evaluation, enabling human-AI collaboration in mathematically constrained design. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While generative AI…

11
Hugging Face Daily Papers research 4d ago

Fast LeWorldModel

Abstract Fast-LeWM accelerates visual planning by replacing autoregressive rollout with parallel action-prefix prediction, reducing computational costs and latency accumulation during long-horizon predictions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Joint-Embedding…

20
Hugging Face Daily Papers research 5d ago

ABACUS: Adapting Unified Foundation Model for Bridging Image Count Understanding and Generation

Abstract ABACUS is a unified vision-language model that performs object counting and related tasks through innovative spatial grounding, boundary-aware counting policies, and self-critical learning strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct ABACUS is a unified…

16
Hugging Face Daily Papers research 5d ago

Neglected Free Lunch from Post-training: Progress Advantage for LLM Agents

Abstract Reinforcement learning post-training enables effective step-level scoring for language models without requiring dedicated reward model training by deriving an implicit advantage function called progress advantage. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Process…

6
Hugging Face Daily Papers research 5d ago

Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation

Abstract A unified agentic framework called Qwen-Image-Agent is proposed to address the context gap in text-to-image generation by progressively constructing complete generation context through planning, reasoning, searching, and memory mechanisms. Generated by…

22
Hugging Face Daily Papers research 5d ago

Information-Aware KV Cache Compression for Long Reasoning

Abstract InfoKV is an entropy-aware KV cache compression framework that enhances long-context reasoning in LLMs by incorporating information-theoretic signals alongside attention weights. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reasoning capability has advanced rapidly in…

10
Hugging Face Daily Papers research 5d ago

EO-WM: A Physically Informed World Model for Probabilistic Earth Observation Forecasting

Abstract EO-WM is a video diffusion transformer for multispectral Earth Observation forecasting that incorporates physically informed conditioning frameworks to better capture weather-driven uncertainties in land-surface dynamics. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

10
Hugging Face Daily Papers research 5d ago

LISA: Likelihood Score Alignment for Visual-condition Controllable Generation

Abstract Score-based generative modeling reveals that side networks contribute likelihood scores to conditional control, leading to improved training efficiency through likelihood score alignment regularization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The prevalent…

36
Hugging Face Daily Papers research 5d ago

Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments

Abstract A web-based benchmark evaluates agent generalization across challenging scenarios, revealing significant gaps between current agentic systems and human performance in temporal perception, graphical understanding, and 3D reasoning. Generated by…

10
Hugging Face Daily Papers research 5d ago

When Does Combining Language Models Help? A Co-Failure Ceiling on Routing, Voting, and Mixture-of-Agents Across 67 Frontier Models

Abstract Multi-model systems face fundamental accuracy limits determined by the rate at which all models fail simultaneously, regardless of their individual correlations or ensemble strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multi-model LLM systems such as routing,…

11
Hugging Face Daily Papers research 6d ago

PhysiFormer: Learning to Simulate Mechanics in World Space

Abstract PhysiFormer uses coordinate-space diffusion to generate physically-plausible 3D object motions without explicit inductive biases, enabling efficient multi-object reasoning and generalization to complex materials and geometries. Generated by…

30
Hugging Face Daily Papers research 6d ago

CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies

Abstract CoffeeBench evaluates LLM agents in a multi-agent economic simulation where firms interact over 90 days to maximize profits, revealing differences in communication patterns and performance among various models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As LLM agents…

4
Hugging Face Daily Papers research 6d ago

Discretizing Reward Models

Abstract Reward models in reinforcement learning suffer from oversensitivity issues where they assign different scores to equally good responses, leading to poor policy learning, but this can be mitigated through discretization techniques that maintain discriminative ability…

16
Hugging Face Daily Papers research 6d ago

JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

Abstract JetSpec is a speculative decoding framework that combines efficient forward drafting with causal conditioning to improve LLM inference speed and acceptance rates across various benchmarks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Speculative decoding (SD)…

17
Hugging Face Daily Papers research 6d ago

How Post-Training Shapes Biological Reasoning Models

Abstract Post-training stages in biological reasoning models differently affect generalization, with continued pre-training aligning models with biological language, supervised fine-tuning improving in-domain performance but reducing out-of-domain generalization, and…

8
Hugging Face Daily Papers research 6d ago

Hallucination in World Models is Predictable and Preventable

Abstract World models exhibit hallucinations in low-data regions of state-action space, which can be detected and mitigated using data-centric signals and coverage-aware sampling techniques. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Modern generative world models render…

25

Illuminating Unified Multimodal Model for Free-form Interleaved Text-Image Generation

Beyond IID: How General Are Tabular Foundation Models, Really?

Agentic Abstention: Do Agents Know When to Stop Instead of Act?

ZooClaw-FashionSigLIP2: Distilled Fine-tuning for Robust Fashion Retrieval

Large-Scale Tunnel Air-Ground Collaboration With FLISP: Fast LiDAR-IMU Synchronized Path Planner

Beyond Drug Discovery: The Nanotechnology Molecular Optimization (NMO) Benchmark

TheoremGraph: Bridging Formal and Informal Mathematics

Learning Transferable Dynamics Priors from Action to World Modeling

The Surprising Effectiveness of Video Diffusion Models for Hand Motion Reconstruction

Interleaved Speech Language Models Latently Work In Text

TUA-Bench: A Benchmark for General-Purpose Terminal-Use Agents

Video-MME-Logical: A Controlled Diagnostic Benchmark for Video Temporal-Logical Reasoning

Unlocking the Visual Record of Materials Science: A Large-Scale Multimodal Dataset from Scientific Literature

Trimming the Long-Tail of Visual World Modeling Evaluation

LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing

Geometric Stability of Neural Population Codes: Regional Variation, Behavioral Relevance, and Circuit Dependence

Walking in the Implicit: Interactive World Exploration via Neural Scene Representation

SafePyramid: A Hierarchical Benchmark for In-context Policy Guardrailing

PoseShield: Neural Collision Fields for Human Self-Collision Resolution

Orca: The World is in Your Mind

ReFreeKV: Towards Threshold-Free KV Cache Compression

Monte Carlo Energy Aggregation for Mobile 3D Gaussian Splatting

Nemotron-Labs-Diffusion-Image: Advancing Masked Discrete Diffusion for High-Resolution Image Synthesis

TACO: Tool-Augmented Credit Optimization for Agentic Tool Use

ReasoningLens: Hierarchical Visualization and Diagnostic Auditing for Large Reasoning Models

Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent

PolicyGuard: A Dialogue-Grounded Sub-Agent Verifier for Policy Adherence in LLM Agents

Focusing on What Matters: Saliency-Harnessing Accurate Routing for Diffusion MoE

How Good Can Linear Models Be for Time-Series Forecasting?

GUICrafter: Weakly-Supervised GUI Agent Leveraging Massive Unannotated Screenshots

Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction

Bridging VideoQA and Video-Guided Agentic Tasks via Generalized Keyframe Extraction

MIMFlow: Integrating Masked Image Modeling with Normalizing Flows for End-to-End Image Generation

AsyncOPD: How Stale Can On-Policy Distillation Be?

COrigami: An AI Pipeline for Co-Designing Flat-Foldable Visually Recognisable Origami

Fast LeWorldModel

ABACUS: Adapting Unified Foundation Model for Bridging Image Count Understanding and Generation

Neglected Free Lunch from Post-training: Progress Advantage for LLM Agents

Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation

Information-Aware KV Cache Compression for Long Reasoning

EO-WM: A Physically Informed World Model for Probabilistic Earth Observation Forecasting

LISA: Likelihood Score Alignment for Visual-condition Controllable Generation

Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments

When Does Combining Language Models Help? A Co-Failure Ceiling on Routing, Voting, and Mixture-of-Agents Across 67 Frontier Models

PhysiFormer: Learning to Simulate Mechanics in World Space

CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies

Discretizing Reward Models

JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

How Post-Training Shapes Biological Reasoning Models

Hallucination in World Models is Predictable and Preventable