Hugging Face Daily Papers
500 articles archived · Visit source ↗ · RSS
-
Hugging Face Daily Papers research 14d ago
CEO-Bench: Can Agents Play the Long Game?
Abstract CEO-Bench evaluates language model agents' ability to manage a simulated startup over 500 days, testing their proficiency in long-term planning, noise handling, adaptability, and multi-task coordination through a Python interface. Generated by…
5 -
Hugging Face Daily Papers research 14d ago
Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding
Abstract Quality-aware self-distillation improves vision-language model performance for GUI grounding by enhancing coordinate-token teacher signals through correctness-aware gating and probability scaling. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Graphical user interface…
38 -
Hugging Face Daily Papers research 14d ago
IndustryBench-MIPU: Benchmarking Multi-Image Attribute Value Extraction for Industrial Products
Abstract IndustryBench-MIPU is introduced as the first large-scale benchmark for multi-image industrial product understanding, focusing on structured attribute extraction from heterogeneous product images to evaluate multimodal models' ability to recover dense technical…
24 -
Hugging Face Daily Papers research 14d ago
Kairos: A Native World Model Stack for Physical AI
Abstract Kairos is a native world model framework that learns from diverse experiences, maintains persistent states through hybrid temporal attention, and supports efficient deployment for physical AI applications. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World models are…
33 -
Hugging Face Daily Papers research 14d ago
Learning User Simulators with Turing Rewards
Abstract A reinforcement learning approach using Turing test-based rewards trains language models to generate responses indistinguishable from human users in conversational and forum discussion settings. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Learning to simulate human…
26 -
Hugging Face Daily Papers research 14d ago
Physics-IQ Verified
Abstract A systematic evaluation of the Physics-IQ benchmark reveals limitations in measuring physical understanding of video generative models, leading to improvements in prompt quality and sample-level scoring that enhance reliability for assessing physically accurate video…
29 -
Hugging Face Daily Papers research 14d ago
Guava: An Effective and Universal Harness for Embodied Manipulation
Abstract A harness framework for embodied tool use combines high-level reasoning with external modules, enabling compact models to perform complex manipulation tasks with minimal training data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Language models trained on large-scale…
15 -
Hugging Face Daily Papers research 14d ago
Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games
Abstract A new benchmark suite called RNG-Bench is introduced to evaluate multimodal foundation models' ability to reconstruct past observations and use them for decision-making in multi-step interactions, featuring two games with controlled difficulty parameters and a memory…
23 -
Hugging Face Daily Papers research 14d ago
Sumi: Open Uniform Diffusion Language Model from Scratch
Abstract A large-scale uniform diffusion language model pretrained from scratch demonstrates competitive performance on knowledge and reasoning tasks while highlighting differences in commonsense reasoning compared to autoregressive models. Generated by…
15 -
Hugging Face Daily Papers research 14d ago
ActWorld: From Explorable to Interactive World Model via Action-Aware Memory
Abstract ActWorld extends navigation-centric interactive world models to support object interaction through a chunk-autoregressive framework with hierarchical action-aware memory and persistent memory banks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Interactive world models…
9 -
Hugging Face Daily Papers research 14d ago
Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences
Abstract A unified scientific generative language model encodes diverse scientific objects and spatial interactions as token sequences, demonstrating strong performance across multiple domains through autoregressive next-token prediction. Generated by…
16 -
Hugging Face Daily Papers research 14d ago
Beyond Scalar Distances: Semantic Attribute Gradients from Frozen MLLMs for Visual Embeddings
Abstract SAGA framework uses multimodal large language models to provide attribute-aware supervision for vision encoders through Group Relative Policy Optimization, improving zero-shot image retrieval performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision encoders for…
21 -
Hugging Face Daily Papers research 15d ago
Self-Evolving Visual Questioner
Abstract A vision-language model autonomously improves its question-generation capabilities through self-evolution, enhancing both question quality and answerer performance without external supervision. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision-language models (VLMs)…
10 -
Hugging Face Daily Papers research 15d ago
Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems
Abstract Multi-agent LLM systems with shared state are analyzed through formal methods identifying concurrency anomalies and establishing a verified consistency hierarchy with mechanized proofs of soundness and completeness. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
14 -
Hugging Face Daily Papers research 15d ago
The Price of Anarchy in Disaggregated Inference
Abstract Disaggregated inference architectures separate prefill and decode phases across distinct GPU pools, and a game-theoretic analysis characterizes how GPU saturation affects system performance through regime transitions and payoff structure changes, enabling an adaptive…
25 -
Hugging Face Daily Papers research 15d ago
Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion
Abstract DR-DCI framework combines retrieval with direct corpus interaction by dynamically pulling relevant documents into a local workspace, enabling scalable and efficient agentic search across large corpora. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Agentic search over…
27 -
Hugging Face Daily Papers research 15d ago
Visual-Seeker: Towards Visual-Native Multimodal Agentic Search via Active Visual Reasoning
Abstract Visual-Seeker enables visual-native multimodal deep search through active visual reasoning, outperforming proprietary models on real-world web environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multimodal large language models (MLLMs) have demonstrated…
25 -
Hugging Face Daily Papers research 15d ago
EgoCS-400K: An Egocentric Gameplay Dataset for World Models
Abstract EgoCS-400K is a large-scale egocentric Counter-Strike dataset that bridges passive web videos and costly real-world embodied data by providing temporally aligned video-action-language trajectories with detailed player states and game events. Generated by…
16 -
Hugging Face Daily Papers research 15d ago
RepSelect: Robust LLM Unlearning via Representation Selectivity
Abstract RepSelect isolates forget-set-specific representations in LLMs by collapsing top principal components of weight gradients, achieving deeper and more robust unlearning compared to existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Making large language models…
32 -
Hugging Face Daily Papers research 15d ago
RefGC-SR^2: Reference-guided Generated Content Super-Resolution and Refinement
Abstract A new reference-guided generated content super-resolution-refinement task is introduced that simultaneously recovers high-resolution details and refines generative artifacts using a frequency-aware diffusion transformer model. Generated by…
32 -
Hugging Face Daily Papers research 15d ago
Text-Vision Co-Instructed Image Editing
Abstract A unified text-visual image editing framework is presented that combines semantic intent from textual instructions with spatial guidance from visual prompts to achieve more precise and faithful image manipulation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing…
16 -
Hugging Face Daily Papers research 15d ago
Learning from the Self-future: On-policy Self-distillation for dLLMs
Abstract d-OPSD introduces a novel on-policy self-distillation framework for diffusion language models by adapting self-teacher construction and supervision mechanisms to match the non-autoregressive nature of diffusion models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
29 -
Hugging Face Daily Papers research 15d ago
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling
Abstract Parallel loop Transformers achieve better code generation performance with two loops due to refined representations, while additional loops cause diminishing returns and increased positional mismatch costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Looped…
5 -
Hugging Face Daily Papers research 15d ago
Rethinking the Role of Efficient Attention in Hybrid Architectures
Abstract Hybrid architectures combining full attention with efficient attention modules like sliding-window attention exhibit distinct scaling behaviors and optimization trajectories, with efficient attention primarily affecting the emergence speed of long-context capabilities…
29 -
Hugging Face Daily Papers research 15d ago
Variable-Width Transformers
Abstract A novel transformer architecture with nonuniform width allocation across layers achieves better performance and efficiency compared to uniform designs by utilizing a parameter-free residual resizing mechanism. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Scaling model…
5 -
Hugging Face Daily Papers research 15d ago
ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions
Abstract ChLogic benchmark reveals persistent performance gaps between English and Chinese logical reasoning in large language models, influenced by surface realization differences and translation artifacts. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models…
37 -
Hugging Face Daily Papers research 15d ago
MemSlides: A Hierarchical Memory Driven Agent Framework for Personalized Slide Generation with Multi-turn Local Revision
Abstract MemSlides presents a hierarchical memory framework for personalized presentation agents that separates long-term user profiles, working memory for session constraints, and tool memory for reusable execution experiences to enable stable personalization and reliable local…
21 -
Hugging Face Daily Papers research 15d ago
MotionVLA: Vision-Language-Action Model for Humanoid Motion
Abstract A dual-stream frequency tokenizer and autoregressive model are proposed to improve humanoid motion generation by separately encoding pose and physical dynamics, achieving better diversity and consistency compared to single-codebook approaches. Generated by…
11 -
Hugging Face Daily Papers research 15d ago
Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion
Abstract Spectral Forcing, a time-conditional 2D-DCT low-pass operator, improves diffusion model efficiency by explicitly separating signal from noise in pixel-space models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Pixel-space diffusion models are trained on full-bandwidth…
32 -
Hugging Face Daily Papers research 15d ago
ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining
Abstract A unified Vision-Language-Action pretraining framework leverages heterogeneous data sources including human egocentric videos and robot trajectories through a reliability-aware training approach that improves performance on embodied AI tasks. Generated by…
6 -
Hugging Face Daily Papers research 15d ago
ProCUA-SFT Technical Report
Abstract Training computer-use agents using a large-scale synthetic dataset with automated task generation and verification achieves significantly improved performance on desktop interaction benchmarks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Training computer-use agents…
4 -
Hugging Face Daily Papers research 15d ago
Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients
Abstract Zone of Proximal Policy Optimization (ZPPO) improves knowledge distillation by using reformulated prompts that help students learn from both correct and incorrect responses, enhancing performance especially at smaller model sizes. Generated by…
32 -
Hugging Face Daily Papers research 15d ago
OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation
Abstract OPD-Evolver is a self-evolving agent framework that combines slow-fast co-evolution with on-policy self-distillation to enhance memory management and policy learning across multiple domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Memory has become a standard…
28 -
Hugging Face Daily Papers research 15d ago
Beyond Monolingual Deep Research: Evaluating Agents and Retrievers with Cross-Lingual BrowseComp-Plus
Abstract Research agents face significant challenges when evidence is in a different language than the query, with performance degrading even when gold evidence is provided directly. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Deep research agents are increasingly evaluated on…
28 -
Hugging Face Daily Papers research 15d ago
A Gradient Perspective on RLVR Stability and Winner Advantage Policy Optimization
Abstract Training instability in reinforcement learning with verifiable rewards is analyzed through token-level gradient dynamics, leading to a stable policy optimization method that updates only on positive-advantage completions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
20 -
Hugging Face Daily Papers research 15d ago
GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?
Abstract End-to-end game generation presents significant challenges for coding agents, requiring them to create complete playable games from natural language descriptions while meeting specific evaluation criteria for engine grounding, artifact completeness, and interactive…
31 -
Hugging Face Daily Papers research 15d ago
Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification
Abstract UniAR presents a unified autoregressive framework that uses a single discrete visual tokenizer to bridge visual understanding and generation, achieving state-of-the-art results in image generation and editing through multi-level feature fusion, bitwise quantization, and…
19 -
Hugging Face Daily Papers research 15d ago
Looped World Models
Abstract Looped World Models introduce iterative latent state refinement through shared transformer blocks, achieving 100x parameter efficiency while adapting computational depth to prediction complexity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Current world models face a…
14 -
Hugging Face Daily Papers research 15d ago
TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs
Abstract A framework called TRIAGE is proposed to improve clinical early warning systems by training large language models to generate dialectical reasoning for continuous risk scoring with better calibration and interpretability. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
29 -
Hugging Face Daily Papers research 15d ago
LectūraAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted Learning and Embodied Teaching
Abstract LectūraAgents is a multi-agent framework that enables personalized learning through adaptive embodied teaching by mimicking professor-student interactions and generating coordinated teaching actions aligned with learner profiles. Generated by…
9 -
Hugging Face Daily Papers research 15d ago
Aligning Quantum Operators with Large Language Models
Abstract Large language models can be adapted to understand quantum operators by mapping unitary matrices into their latent space, enabling quantum circuit synthesis and language-conditioned gate constraint specification. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Can Large…
19 -
Hugging Face Daily Papers research 15d ago
Attacks on Machine-Text Detectors Retain Stylistic Fingerprints
Abstract Machine-text detection remains challenging despite evasion techniques, but stylistic features can provide robust defense when analyzed across multiple documents rather than individual instances. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Despite considerable progress…
17 -
Hugging Face Daily Papers research 16d ago
You Don't Need Strong Assumptions: Visual Representation Learning via Temporal Differences
Abstract Temporal Difference in Vision (TDV) presents a novel self-supervised learning approach for video data that eliminates traditional inductive biases by leveraging causal relationships between past and future frames. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Progress in…
30 -
Hugging Face Daily Papers research 16d ago
Track2View: 4D-Consistent Camera-Controlled Video Generation via Paired 3D Point Tracks
Abstract Track2View generates novel camera viewpoints from videos by using 3D point tracks to establish explicit spatiotemporal correspondences, achieving superior visual quality and camera accuracy compared to existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
9 -
Hugging Face Daily Papers research 16d ago
ExpRL: Exploratory RL for LLM Mid-Training
Abstract ExpRL uses human-written question-answer data as reward scaffolds to provide automated reinforcement learning priming for language models, outperforming traditional methods on math reasoning tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Sparse reward reinforcement…
23 -
Hugging Face Daily Papers research 16d ago
Human Universal Grasping
Abstract A flow-matching model generates diverse human grasps from RGB-D images, enabling zero-shot robotic grasping with improved performance over existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Humans can grasp objects effortlessly, whereas multi-fingered robots…
25 -
Hugging Face Daily Papers research 16d ago
EgoPhys: Learning Generalizable Physics Models of Deformable Objects from Egocentric Video
Abstract EgoPhys enables deformable digital twin generation from egocentric RGB video by using generalizable priors and compact codebooks to predict dense spring stiffness fields without per-spring optimization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Humans naturally…
33 -
Hugging Face Daily Papers research 16d ago
Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders
Abstract Sparse autoencoders exhibit feature stability patterns where stable features carry most predictive signal while unstable features reflect reproducible low-dimensional structure despite individual non-reproducibility. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Sparse…
13 -
Hugging Face Daily Papers research 16d ago
LaWAM: Latent World Action Models for Efficient Dynamics-Aware Robot Policies
Abstract LaWAM enables efficient robot control by predicting compact latent visual subgoals instead of expensive video generation, achieving high performance with reduced computational latency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision-Language-Action models (VLAs)…
33 -
Hugging Face Daily Papers research 16d ago
MVEB: Massive Video Embedding Benchmark
Abstract A large-scale video embedding benchmark evaluates diverse models across multiple video understanding tasks, revealing that different model architectures excel in specific domains and demonstrating the nuanced impact of audio on performance based on dataset…
7