Hugging Face Daily Papers

500 articles archived · Visit source ↗ · RSS

Hugging Face Daily Papers research 13d ago

No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages

Abstract Research addresses code generation challenges for no-resource programming languages by developing benchmarks and proposing a method that combines further pre-training with weight difference transfer to create specialized instruction-following models at reduced…

27
Hugging Face Daily Papers research 13d ago

JAMER: Project-Level Code Framework Dataset and Benchmark on Professional Game Engines

Abstract Game development frameworks and benchmarks were created using data from game jam competitions to evaluate code generation and project-level programming capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Current AI-driven game development has made substantial…

25
Hugging Face Daily Papers research 13d ago

ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?

Abstract ImageWAM demonstrates that pretrained image editing models can effectively replace video generation in world action models for robot control, achieving better performance with reduced computational costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World Action Models…

25
Hugging Face Daily Papers research 13d ago

ENPIRE: Agentic Robot Policy Self-Improvement in the Real World

Abstract ENPIRE framework enables autonomous robotics research through a closed-loop system that automates policy improvement via environment feedback, policy refinement, and evolutionary code optimization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Achieving dexterous robotic…

27
Hugging Face Daily Papers research 13d ago

Holo-World: Unified Camera, Object and Weather Control for Video World Model

Abstract A unified controllable video world model generates videos from a single image while preserving scene structure and transferring to target weather states through specialized parameterization and conditioning techniques. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Video…

22
Hugging Face Daily Papers research 13d ago

Current World Models Lack a Persistent State Core

Abstract Current world models fail to maintain consistent world states when unobserved, indicating a need for design changes that prioritize physical state stability over appearance fidelity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World models are increasingly regarded as…

18
Hugging Face Daily Papers research 13d ago

Taylor-Calibrate: Principled Initialization for Hybrid Linear Attention Distillation

Abstract Hybrid linear attention models can be improved through a novel initialization technique that enhances conversion from pretrained Transformers by leveraging teacher attention statistics and alignment steps. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Hybrid linear…

6
Hugging Face Daily Papers research 13d ago

FlowBender: Feedback-Aware Training for Self-Correcting Conditional Flows

Abstract FlowBender is a closed-loop framework that addresses constraint satisfaction in diffusion and flow models by training networks to correct alignment errors using inference-time feedback, outperforming traditional supervised and guidance-based approaches across multiple…

11
Hugging Face Daily Papers research 13d ago

DragMesh-2: Physically Plausible Dexterous Hand-Object Interaction with Articulated Objects

Abstract DragMesh-2 enables dexterous hand-object interaction through contact-driven manipulation, with PICA enhancing robustness under varying contact loads without tactile feedback. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Dexterous interaction with articulated objects is…

19
Hugging Face Daily Papers research 13d ago

HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining

Abstract Egocentric human video can effectively replace teleoperated robot trajectories for embodied model pretraining, achieving better performance with reduced data collection costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Embodied foundation models are expected to…

22
Hugging Face Daily Papers research 13d ago

DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis

Abstract A large-scale real-world dataset called DF3DV-1K is introduced to address the lack of clean and cluttered image sets for distractor-free radiance field research, containing 1,048 scenes with 89,924 images across 128 distractor types and 161 scene themes, along with a…

5
Hugging Face Daily Papers research 13d ago

Playful Agentic Robot Learning

Abstract Embodied robots learn reusable skills through self-directed play and exploration, then apply these skills to improve performance on downstream tasks without additional training. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Current agentic robot systems can write…

4
Hugging Face Daily Papers research 13d ago

S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence

Abstract S-Agent is a spatial reasoning framework that enhances visual language models with temporal memory and hierarchical spatial tools to enable continuous 3D world understanding from multi-view imagery. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Real-world spatial…

28
Hugging Face Daily Papers research 13d ago

JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising

Abstract A fast, training-free framework generates text-driven 3D visual illusions by decoupling generation into cross-space dual-branch denoising and view-conditioned texture synthesis for seamless geometric fusion and semantic coherence. Generated by…

26
Hugging Face Daily Papers research 13d ago

Understanding the Behaviors of Environment-aware Information Retrieval

Abstract Large language models can be trained via reinforcement learning to adapt query formulation strategies for different retrievers, with distinct optimal query styles and improved performance through retriever-specific guidance and model scaling. Generated by…

16
Hugging Face Daily Papers research 13d ago

FAPO: Fully Autonomous Prompt Optimization of Multi-Step LLM Pipelines

Abstract FAPO optimizes LLM pipelines by combining prompt editing with structural changes, demonstrating superior performance across multiple benchmarks and security tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multi-step LLM pipelines fail through interactions among…

38
Hugging Face Daily Papers research 13d ago

Selective Synergistic Learning for Video Object-Centric Learning

Abstract Selective Synergistic Learning (SSync) addresses limitations in video object-centric learning by selectively distilling reliable cues through pseudo-labeling and transitive merging to improve object decomposition quality and robustness. Generated by…

30
Hugging Face Daily Papers research 13d ago

Adaptive Volumetric Mechanical Property Fields Invariant to Resolution

Abstract AdaVoMP predicts dense spatially-varying mechanical properties for 3D objects using a sparse adaptive voxel structure and transformer encoder-decoder model, enabling realistic deformable simulations with improved accuracy and efficiency. Generated by…

33
Hugging Face Daily Papers research 13d ago

FreeStyle: Free Control of Style-Content Dual-Reference Generation from Community LoRA Mining

Abstract FreeStyle is a scalable dual-reference generation framework that uses community LoRA mining to create large-scale style-content triplets while addressing content leakage through disentanglement mechanisms and a comprehensive benchmark. Generated by…

16
Hugging Face Daily Papers research 13d ago

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

Abstract Aggregate-score leaderboards in agent benchmarks fail to capture deployment-relevant dimensions and show rank instability, necessitating new evaluation frameworks based on predictive validity and out-of-distribution criteria. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

27
Hugging Face Daily Papers research 13d ago

Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance

Abstract A lightweight image inpainting framework achieves high-fidelity results with significantly reduced parameters and inference time through novel local-global interaction blocks and adaptive distillation strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While…

35
Hugging Face Daily Papers research 13d ago

Thinking with Visual Grounding

Abstract Visually grounded thinking integrates natural-language reasoning with explicit visual evidence grounding in vision-language models, improving reasoning accuracy through scalable synthesis and reinforcement learning techniques. Generated by…

34
Hugging Face Daily Papers research 13d ago

LooseControlVideo: Directorial Video Control using Spatial Blocking

Abstract LooseControlVideo enables intuitive 3D spatial control in text-to-video generation using sparse oriented 3D boxes as proxies, achieving superior trajectory accuracy and occlusion handling compared to existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Precise…

10
Hugging Face Daily Papers research 13d ago

REVES: REvision and VErification--Augmented Training for Test-Time Scaling

Abstract A two-stage iterative framework alternates between data augmentation and policy optimization to improve LLM reasoning by leveraging intermediate correction steps, achieving superior performance on coding benchmarks and constraint satisfaction problems. Generated by…

23
Hugging Face Daily Papers research 13d ago

Re-Centering Humans in LLM Personalization

Abstract Human-centered evaluation reveals significant gaps between synthetic and real-world LLM personalization performance, with models struggling to extract user attributes and generate truly personalized responses that match human quality judgments. Generated by…

30
Hugging Face Daily Papers research 13d ago

Reinforcement Learning-Guided Retrieval with Soft Fusion for Robust Multimodal Imitation Learning under Missing Modalities

Abstract RL4IL enables robust robotic manipulation under sensor dropout by using reinforcement learning to retrieve relevant demonstrations and cross-attention fusion to impute missing modalities without retraining. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Robotic systems…

23
Hugging Face Daily Papers research 13d ago

When Does Trajectory-Level Supervision Permit Efficient Offline Reinforcement Learning?

Abstract Offline reinforcement learning with trajectory-level outcome supervision presents statistical challenges that can be addressed through pessimistic actor-critic methods, though fundamental barriers exist for certain generalized outcome-based problems. Generated by…

35
Hugging Face Daily Papers research 13d ago

HiLo-Token: Input-Adaptive High-Low Frequency Token Compression for Efficient Image Editing

Abstract A novel token compression framework called HiLo-Token is introduced to accelerate Diffusion Transformers in image editing tasks by adaptively allocating tokens based on spatial frequency and context importance, achieving significant speedups without quality loss.…

27
Hugging Face Daily Papers research 14d ago

The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL

Abstract Discriminator-Guided Reinforcement Learning (DRL) addresses alignment issues in score- and flow-matching models by using a pretrained representation space discriminator as an optimal reward signal, improving both visual fidelity and semantic quality without human…

4
Hugging Face Daily Papers research 14d ago

MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model

Abstract MaineCoon represents the first real-time audio-visual autoregressive model for social worlds, achieving high frame rates and long-horizon generation through novel training techniques and inference frameworks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As an increasing…

21
Hugging Face Daily Papers research 14d ago

MolmoMotion: Forecasting Point Trajectories in 3D with Language Instruction

Abstract 3D point motion forecasting model predicts object trajectories from visual history and language goals, demonstrating superior performance on benchmarks and transferring effectively to robot manipulation and video generation tasks. Generated by…

4
Hugging Face Daily Papers research 14d ago

ViT-Up: Faithful Feature Upsampling for Vision Transformers

Abstract ViT-Up is a feature upsampling framework for Vision Transformers that uses layer-wise query construction from hidden states to improve dense prediction tasks, outperforming existing image-guided methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision Transformers…

27
Hugging Face Daily Papers research 14d ago

Bag of Dims: Training-Free Mechanistic Interpretability via Dimension-Level Sign Patterns

Abstract The standard basis of transformer hidden states serves as a training-free, architecture-general feature representation where individual dimensions encode semantic content through signs and confidence through magnitudes, functioning as independent binary registers…

10
Hugging Face Daily Papers research 14d ago

iOSWorld: A Benchmark for Personally Intelligent Phone Agents

Abstract IOSWorld is introduced as the first interactive native iOS simulator benchmark featuring persistent user identity across multiple apps to evaluate personalized mobile agent capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct A useful phone agent needs to be…

6
Hugging Face Daily Papers research 14d ago

MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents

Abstract MyPCBench evaluates computer-use agents as personal assistants in a simulated Linux desktop environment with real-world web applications, revealing that Claude Opus 4.6 achieves the highest task completion rate of 55.4% while struggles with multi-application tasks and…

29
Hugging Face Daily Papers research 14d ago

A Benchmark and Framework for Evaluating Next Action Predictions in Spreadsheets

Abstract A benchmark for predicting spreadsheet user actions is introduced, addressing challenges in edit history availability and complex action spaces through manual curation and online evaluation methodology. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Predictive code…

17
Hugging Face Daily Papers research 14d ago

LLM-Enabled NWDAF: A Step Toward AI-Native 6G Network Intelligence

Abstract An open-source Network Data Analytics Function compatible with Free5GC integrates a Large Language Model interface for natural language interaction and intent-based network management. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The Network Data Analytics Function…

17
Hugging Face Daily Papers research 14d ago

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

Abstract GRPO algorithms face policy entropy collapse during training, which STARE addresses through surprisal-guided token-level advantage reweighting and target-entropy regulation to maintain stable reinforcement learning for large language models. Generated by…

13
Hugging Face Daily Papers research 14d ago

Morpheus: A Morphology-Aware Neural Tokenizer and Word Embedder for Turkish

Abstract A neural morpheme-boundary model for Turkish achieves lossless tokenization and morphology-aware embeddings with improved efficiency and performance over traditional subword methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Turkish is agglutinative: meaning is…

27
Hugging Face Daily Papers research 14d ago

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

Abstract A framework automates environment redesign in reinforcement learning for large language models by having the policy analyze failures and suggest configuration changes, achieving superior performance over larger proprietary models and fixed-environment baselines.…

6
Hugging Face Daily Papers research 14d ago

EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts

Abstract EfficientRollout is a system-aware self-speculative decoding framework that accelerates reinforcement learning rollouts by adapting drafters to evolving policies and optimizing speculative decoding regimes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reinforcement…

36
Hugging Face Daily Papers research 14d ago

Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems

Abstract Multicultural multi-agent systems exhibit limited value diversity despite cultural alignment, with social interaction reducing diversity and compromising collective decision-making breadth. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multicultural multi-agent systems…

28
Hugging Face Daily Papers research 14d ago

PAIWorld: A 3D-Consistent World Foundation Model for Robotic Manipulation

Abstract PAIWorld enhances diffusion-transformer world models with geometric awareness and cross-view attention to improve multi-view 3D consistency for robotic manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World foundation models (WFMs) are powerful…

18
Hugging Face Daily Papers research 14d ago

Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness

Abstract Xcientist enables transparent and accountable AI-driven scientific research by creating persistent artifacts that track the complete research process from problem formulation to mechanism validation and revision. Generated by Qwen/Qwen2.5-Coder-32B-Instruct AI systems…

11
Hugging Face Daily Papers research 14d ago

SciOrch: Learning to Orchestrate Expert LLMs for Solving Frontier Multimodal Scientific Reasoning Tasks

Abstract SciOrch is a framework that uses a lightweight orchestrator model to coordinate multiple frontier LLMs for scientific reasoning, achieving superior performance through MCTS-based training and GRPO-style optimization while reducing API costs. Generated by…

31
Hugging Face Daily Papers research 14d ago

RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents

Abstract RODS addresses sample depletion in multi-turn tool-use reinforcement learning by dynamically synthesizing new data based on reward variance to maintain informative training samples. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multi-turn tool-use RL is bottlenecked by…

21
Hugging Face Daily Papers research 14d ago

Native Active Perception as Reasoning for Omni-Modal Understanding

Abstract OmniAgent is a novel omni-modal agent that addresses long video understanding by using an iterative observation-thought-action cycle with active perception, achieving superior performance compared to larger models through efficient selective processing. Generated by…

24
Hugging Face Daily Papers research 14d ago

Reinforcing Dual-Path Reasoning in Spatial Vision Language Models

Abstract A unified framework for spatial vision-language models that combines linguistic deduction and 3D geometric reasoning through reinforcement learning, enabling robust spatial reasoning across diverse tasks and domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Spatial…

9
Hugging Face Daily Papers research 14d ago

SAE Interventions are Unreliable: Post-Intervention Recovery of Suppressed Behavior

Abstract Sparse Autoencoders' feature-level interventions may appear successful but can be circumvented through residual-space optimization that recovers original behaviors, revealing limitations in using SAE features for complete behavioral control. Generated by…

25
Hugging Face Daily Papers research 14d ago

Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation

Abstract ViGOS is a visually grounded on-policy self-distillation framework for multimodal large language models that improves image-grounded behavior by using specialized teachers for different stages of reasoning and handling invalid rollouts. Generated by…

8

No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages

JAMER: Project-Level Code Framework Dataset and Benchmark on Professional Game Engines

ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?

ENPIRE: Agentic Robot Policy Self-Improvement in the Real World

Holo-World: Unified Camera, Object and Weather Control for Video World Model

Current World Models Lack a Persistent State Core

Taylor-Calibrate: Principled Initialization for Hybrid Linear Attention Distillation

FlowBender: Feedback-Aware Training for Self-Correcting Conditional Flows

DragMesh-2: Physically Plausible Dexterous Hand-Object Interaction with Articulated Objects

HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining

DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis

Playful Agentic Robot Learning

S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence

JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising

Understanding the Behaviors of Environment-aware Information Retrieval

FAPO: Fully Autonomous Prompt Optimization of Multi-Step LLM Pipelines

Selective Synergistic Learning for Video Object-Centric Learning

Adaptive Volumetric Mechanical Property Fields Invariant to Resolution

FreeStyle: Free Control of Style-Content Dual-Reference Generation from Community LoRA Mining

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance

Thinking with Visual Grounding

LooseControlVideo: Directorial Video Control using Spatial Blocking

REVES: REvision and VErification--Augmented Training for Test-Time Scaling

Re-Centering Humans in LLM Personalization

Reinforcement Learning-Guided Retrieval with Soft Fusion for Robust Multimodal Imitation Learning under Missing Modalities

When Does Trajectory-Level Supervision Permit Efficient Offline Reinforcement Learning?

HiLo-Token: Input-Adaptive High-Low Frequency Token Compression for Efficient Image Editing

The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL

MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model

MolmoMotion: Forecasting Point Trajectories in 3D with Language Instruction

ViT-Up: Faithful Feature Upsampling for Vision Transformers

Bag of Dims: Training-Free Mechanistic Interpretability via Dimension-Level Sign Patterns

iOSWorld: A Benchmark for Personally Intelligent Phone Agents

MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents

A Benchmark and Framework for Evaluating Next Action Predictions in Spreadsheets

LLM-Enabled NWDAF: A Step Toward AI-Native 6G Network Intelligence

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

Morpheus: A Morphology-Aware Neural Tokenizer and Word Embedder for Turkish

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts

Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems

PAIWorld: A 3D-Consistent World Foundation Model for Robotic Manipulation

Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness

SciOrch: Learning to Orchestrate Expert LLMs for Solving Frontier Multimodal Scientific Reasoning Tasks

RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents

Native Active Perception as Reasoning for Omni-Modal Understanding

Reinforcing Dual-Path Reasoning in Spatial Vision Language Models

SAE Interventions are Unreliable: Post-Intervention Recovery of Suppressed Behavior

Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation