Hugging Face Daily Papers
500 articles archived · Visit source ↗ · RSS
-
Hugging Face Daily Papers research 6h ago
MemLearner: Learning to Query Context memory for Video World Models
Abstract MemLearner improves video world models by using learning-based adaptive context querying with query tokens to enhance scene consistency and memory in long video sequences with occlusions and dynamic objects. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Video World…
24 -
Hugging Face Daily Papers research 12h ago
SpheRoPE: Zero-Shot Optimization-Free 360 Panorama Generation with Spherical RoPE
Abstract A novel zero-shot framework injects spherical priors into pre-trained diffusion transformers for 360 panoramic generation, using spherical RoPE and semantic distortion guidance to overcome topological constraints without training or optimization. Generated by…
35 -
Hugging Face Daily Papers research 13h ago
TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning
Abstract TRIAGE introduces a role-typed credit assignment framework that enhances agentic reinforcement learning by providing more nuanced credit assignment than standard GRPO methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Agentic reinforcement learning requires assigning…
26 -
Hugging Face Daily Papers research 14h ago
SWE-INTERACT: Reimagining SWE Benchmarks as User-Driven Long-Horizon Coding Sessions
Abstract SWE-Interact presents a testbed that evaluates coding agents in realistic multi-turn, user-driven software engineering scenarios, revealing significant gaps between single-turn performance and interactive task completion. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We…
6 -
Hugging Face Daily Papers research 14h ago
Play2Perfect: What Matters in Dexterous Play Pretraining for Precise Assembly?
Abstract A reinforcement learning framework called Play2Perfect enables sample-efficient robotic assembly tasks by first learning general manipulation skills through playful interaction with diverse objects, then adapting these skills for precise assembly through fine-tuning.…
34 -
Hugging Face Daily Papers research 14h ago
PolyFlow: Continuous Topology Embedding Flow Matching for Artist-style Mesh Generation
Abstract PolyFlow introduces a continuous mesh representation using a topology embedder and applies flow-matching with Transformers for parallel mesh generation, achieving faster inference and precise resolution control compared to autoregressive methods. Generated by…
5 -
Hugging Face Daily Papers research 15h ago
Hierarchical Experimentalist Agents
Abstract HExA enables large language models to improve through active experimentation and skill learning in novel domains without requiring training or external supervision. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models (LLMs) are increasingly used to take…
24 -
Hugging Face Daily Papers research 16h ago
Are We Measuring Strategy or Phrasing? The Gap Between Surface- and Approach-Level Diversity in LLM Math Reasoning
Abstract Approach-level diversity in LLM mathematical reasoning captures strategic variation in problem-solving methods, revealing limitations of surface-level diversity metrics and highlighting challenges in directly optimizing diverse reasoning approaches. Generated by…
11 -
Hugging Face Daily Papers research 16h ago
Does VLA Even Know the Basics? Measuring Commonsense and World Knowledge Retention in Vision-Language-Action Models
Abstract Act2Answer protocol evaluates embodied vision-language-action models by having agents answer questions through physical actions, revealing knowledge retention and generalization patterns across different semantic categories. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
35 -
Hugging Face Daily Papers research 16h ago
Lexical Consensus: Grounded Word Learning and Shared Meaning in Artificial Agents
Abstract Grounded word learning experiments using visual embeddings and lexical learners reveal that perceptual distance, rather than semantic relatedness, determines acquisition success, with distinct patterns in naming and retrieval performance. Generated by…
34 -
Hugging Face Daily Papers research 18h ago
MOPD: Multi-Teacher On-Policy Distillation for Capability Integration in LLM Post-Training
Abstract Multi-teacher On-Policy Distillation (MOPD) enables efficient integration of multiple domain capabilities in large language models through specialized reinforcement learning teachers and on-policy distillation, achieving superior performance over existing methods.…
33 -
Hugging Face Daily Papers research 18h ago
Goku: A Million-Scale Universal Dataset and Benchmark for Instruction-Based Video Editing
Abstract A large-scale video editing dataset and model are introduced that support multi-task and structural manipulations through advanced data synthesis and network architectures. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing instruction-based video editing datasets…
38 -
Hugging Face Daily Papers research 19h ago
QVal: Cheaply Evaluating Dense Supervision Signals for Long-Horizon LLM Agents
Abstract A testbed called QVal is introduced for evaluating dense supervision signals in long-horizon LLM agent tasks by measuring how well method scores align with Q-values, enabling fair comparison of different supervision approaches without training. Generated by…
22 -
Hugging Face Daily Papers research 20h ago
FlexiSLM: A Dynamic and Controllable Frame Rate Spoken Language Model
Abstract Flexible Spoken Language Model (FlexiSLM) introduces dynamic frame rate capabilities for speech input and output, achieving superior performance over fixed-frame-rate models while enabling controllable inference speed. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Spoken…
15 -
Hugging Face Daily Papers research 21h ago
Managing Procedural Memory in LLM Agents: Control, Adaptation, and Evaluation
Abstract Procedural memory enhances LLM agents on workplace tasks through skill transfer across roles and models, with varying generalization capabilities affecting deployment strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Procedural memory is increasingly used to…
22 -
Hugging Face Daily Papers research 22h ago
SkillHone: A Harness for Continual Agent Skill Evolution Through Persistent Decision History
Abstract SkillHone enables continuous evolution of agent skills by maintaining persistent decision histories and incorporating practice feedback for improved performance across research and tool-mediated analysis tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Agent skills…
35 -
Hugging Face Daily Papers research 22h ago
DataEvolver: Self-Evolving Multi-Agent Data Construction for Text-Rich Image Generation
Abstract DataEvolver is a self-evolving multi-agent framework that improves text-rich image generation by leveraging feedback from rejected samples to iteratively enhance data quality. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Text-rich image generation is one of the most…
11 -
Hugging Face Daily Papers research 22h ago
MuSViT: A Foundation Vision Model for Sheet Music Representation
Abstract MuSViT is a vision transformer-based foundation model pre-trained on millions of sheet music pages that demonstrates superior performance in music score recognition and symbol detection tasks through both linear probing and fine-tuning approaches. Generated by…
10 -
Hugging Face Daily Papers research 23h ago
Scenes as Objects, Not Primitives: Instance-Structured 3D Tokenization from Unposed Views
Abstract A feed-forward framework decomposes 3D scenes into instance-structured token groups from multi-view images, enabling direct object-level reconstruction, segmentation, and manipulation without 3D annotations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct A 3D scene is…
38 -
Hugging Face Daily Papers research 23h ago
RedVox: Safety and Fairness Gaps in Speech Models Across Languages
Abstract Multilingual safety and fairness benchmark for speech models reveals persistent vulnerabilities across languages and naturalistic conditions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Speech-capable models are increasingly deployed in real-world applications across…
36 -
Hugging Face Daily Papers research 23h ago
Xiaomi-GUI-0 Technical Report
Abstract A native multimodal GUI agent trained in real-device environments demonstrates superior performance and stability compared to traditional benchmark-based approaches. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Graphical user interface (GUI) agents build on…
7 -
Hugging Face Daily Papers research 1d ago
GEAR: Guided End-to-End AutoRegression for Image Synthesis
Abstract GEAR trains a vector-quantized tokenizer and autoregressive generator jointly end-to-end using representation alignment, overcoming non-differentiability issues through a dual read-out approach that improves convergence speed and feature quality. Generated by…
36 -
Hugging Face Daily Papers research 1d ago
Little Brains, Big Feats: Exploring Compact Language Models
Abstract Small language models can effectively perform retrieval-augmented generation tasks directly on-device without GPU acceleration. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While large language models have been dominating the research landscape recently, small language…
13 -
Hugging Face Daily Papers research 1d ago
Multi-Block Diffusion Language Models
Abstract Multi-Block Diffusion Language Models extend single-block diffusion to concurrent block decoding with improved training strategies and optimized decoding algorithms. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Block Diffusion Language Models (BD-LMs) improve…
35 -
Hugging Face Daily Papers research 1d ago
Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs
Abstract Reinforcement learning with metacognitive feedback and metacognitive data selection improve large language model calibration by enabling accurate self-assessment of performance and uncertainty. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Metacognition is a critical…
38 -
Hugging Face Daily Papers research 1d ago
TerraDiT-Ω: Unified Spatial Control for Satellite Image Synthesis with Any Geospatial Primitive
Abstract TerraDiT-Ω generates satellite imagery from native geospatial primitives using Geometry-Aware Local Attention, enabling flexible conditioning and improved downstream geospatial tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Generative models have achieved…
36 -
Hugging Face Daily Papers research 1d ago
PhotoQuilt: Training-Free Arbitrary-Resolution Photomosaics via Bootstrapped Tiled Denoising
Abstract PhotoQuilt is a training-free framework that generates high-resolution photomosaics by combining global layout composition with separate tile generation in latent space, overcoming limitations of diffusion models in balancing local detail and global structure. Generated…
5 -
Hugging Face Daily Papers research 1d ago
BrainJanus: A Unified Model for Understanding and Generation across Brain, Vision, and Language
Abstract BrainJanus represents the first unified brain model integrating brain, vision, and language through a shared Omni space, enabling bidirectional mapping between neural activity and sensory stimuli via a tokenized representation and autoregressive architecture. Generated…
38 -
Hugging Face Daily Papers research 1d ago
BlockPilot: Instance-Adaptive Policy Learning for Diffusion-based Speculative Decoding
Abstract Speculative decoding with adaptive block size selection improves inference efficiency by predicting optimal block sizes from prefilling representations, achieving significant speedup with minimal overhead. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Speculative…
30 -
Hugging Face Daily Papers research 1d ago
AVTok: 1D Unified Tokenization for Holistic Audio-Video Generation
Abstract AVTok is a unified tokenizer for audio-video generation that uses a dual-stream transformer architecture with shared encoder-decoder and modal-specific queries to create compact one-dimensional latent representations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
21 -
Hugging Face Daily Papers research 1d ago
Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks
Abstract Evolutionary fine-tuning enables large language models to develop cross-task problem-solving capabilities by learning from search trajectories, demonstrating improved performance on mathematical conjectures and optimization tasks. Generated by…
11 -
Hugging Face Daily Papers research 1d ago
DOPD: Dual On-policy Distillation
Abstract DOPD addresses privilege illusion in on-policy distillation by dynamically routing token-level supervision between teacher and student policies based on advantage gaps and probabilities, improving capability transfer in large and vision-language models. Generated by…
6 -
Hugging Face Daily Papers research 1d ago
Dockerless: Environment-Free Program Verifier for Coding Agents
Abstract A Dockerless environment-free agentic patch verifier improves code patch evaluation accuracy and enables effective post-training without execution-based verification costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Program verifiers play a central role in training…
21 -
Hugging Face Daily Papers research 1d ago
LUMOS: A Semantic Operating-System Layer for Accessibility-Grounded AI Agents
Abstract LUMOS provides a semantic interaction layer that converts operating system metadata into machine-readable formats, enabling AI agents to interact more efficiently with computer interfaces than through traditional visual methods. Generated by…
34 -
Hugging Face Daily Papers research 1d ago
One Forward Beats Two: InnerZoom for Accurate and Efficient GUI Grounding
Abstract InnerZoom addresses GUI grounding challenges by preserving target-region awareness across decoder layers through a single-forward pass that bridges cross-layer evidence, achieving state-of-the-art performance with reduced computational cost. Generated by…
16 -
Hugging Face Daily Papers research 1d ago
OSWorld2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks
Abstract OSWorld 2.0 presents a comprehensive benchmark for evaluating computer-use agents through complex, real-world workflows that reveal current limitations in agent reasoning and task completion. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing computer-use benchmarks…
24 -
Hugging Face Daily Papers research 1d ago
MirrorPPR: Exemplar-Based Portrait Photo Retouching
Abstract Exemplar-based portrait retouching framework using Diffusion Transformer with LoRA adaptation and self-augmented training data achieves superior quality and identity preservation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While text-guided image editing has made…
24 -
Hugging Face Daily Papers research 1d ago
Delayed Verification Destabilizes Multi-Agent LLM Belief: Instability Thresholds and Optimal Corrector Placement
Abstract Delayed verification in multi-agent LLM systems can cause instability leading to oscillations, but grounded factual answering stabilizes the system by making truth an absorbing boundary. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multi-agent large language model (LLM)…
14 -
Hugging Face Daily Papers research 1d ago
Drop-Then-Recovery: How Redundant Are Vision-Language-Action Models?
Abstract Research reveals that language backbones in Vision-Language-Action models are highly redundant for robotic manipulation tasks, while vision and action pathways are more critical, suggesting need for deliberate capacity allocation in future architectures. Generated by…
11 -
Hugging Face Daily Papers research 1d ago
LLM Program Optimization via Retrieval Augmented Search
Abstract Blackbox adaptation methods using retrieval-augmented search and atomic edit decomposition improve program optimization performance for both C++ and Python code. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent work has demonstrated the potential of large language…
19 -
Hugging Face Daily Papers research 1d ago
Mind the Heads: Topological Representation Alignment for Multimodal LLMs
Abstract HeRA aligns individual attention heads in MLLMs to preserve local neighborhood relationships across modalities, improving vision-centric task performance and reducing visual hallucinations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Representation alignment has…
27 -
Hugging Face Daily Papers research 1d ago
SWE-Together: Evaluating Coding Agents in Interactive User Sessions
Abstract SWE-Together is a multi-turn coding benchmark created from real user-agent interactions, featuring a reactive LLM simulator to evaluate agents based on both final correctness and interaction efficiency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Most coding-agent…
32 -
Hugging Face Daily Papers research 1d ago
DreamForge-World 0.1 Preview: A Low-Compute Real-Time Controllable World Model
Abstract DreamForge-World 0.1 Preview adapts a video generation architecture with a residual action pathway to enable real-time interactive world simulation on consumer hardware with low computational requirements. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present…
18 -
Hugging Face Daily Papers research 1d ago
RocketSmith: Agentic Additive Manufacturing of High-Powered Rockets
Abstract An agentic system using large language models automates high-power rocket design processes, enabling successful flight testing with consistent simulation results. Generated by Qwen/Qwen2.5-Coder-32B-Instruct RocketSmith is an agentic system which intelligently automates…
9 -
Hugging Face Daily Papers research 1d ago
A Gravitational Interpretation of Fine-Tuning Reversion
Abstract Post-alignment safety degradation arises from geometric properties of training history, where fine-tuning reversion follows a persistent direction defined by early training dynamics. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Fine-tuning on harmless data can partially…
35 -
Hugging Face Daily Papers research 1d ago
One Model, Many Latencies: Universal Speech Enhancement for Diverse Real-Time Applications
Abstract A universal speech enhancement model with configurable algorithmic and computational latency controls using parallel convolutions and early-exit mechanisms. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Different real-time speech applications impose distinct latency…
9 -
Hugging Face Daily Papers research 1d ago
SAM2Matting: Generalized Image and Video Matting
Abstract SAM2Matting advances video matting by decoupling tracking and matting tasks through a tracker-to-matting framework that leverages foundational trackers with region-proposal bridges and dedicated matting heads. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Despite…
36 -
Hugging Face Daily Papers research 1d ago
One-Step Gradient Delay is Not a Barrier for Large-Scale Asynchronous Pipeline Parallel LLM Pretraining
Abstract Asynchronous pipeline parallelism with PipeDream-2BW can achieve near-synchronous performance through optimizer selection and error feedback correction, overcoming traditional stability concerns. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Modern large-scale LLM…
29 -
Hugging Face Daily Papers research 1d ago
One Scene, Two Depths: Probing Geometric Ambiguity in Monocular Foundation Models
Abstract A faithful 3D world representation should account for layered geometry, where a single camera ray may contain multiple visible and geometrically valid surfaces. Monocular depth estimation, however, reduces this structure to one scalar depth per pixel. Transparent scenes…
29 -
Hugging Face Daily Papers research 1d ago
RaysUp: Ultra-light Universal Feature Upsampling via Geometry-Aware Ray Representation
Abstract RaysUp is a lightweight, task-agnostic feature upsampling framework that reconstructs high-resolution features using geometry-aware ray domain techniques with improved efficiency and accuracy. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Pre-trained Vision Foundation…
37