Hugging Face Daily Papers
500 articles archived · Visit source ↗ · RSS
-
Hugging Face Daily Papers research 13d ago
No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages
Abstract Research addresses code generation challenges for no-resource programming languages by developing benchmarks and proposing a method that combines further pre-training with weight difference transfer to create specialized instruction-following models at reduced…
27 -
Hugging Face Daily Papers research 13d ago
JAMER: Project-Level Code Framework Dataset and Benchmark on Professional Game Engines
Abstract Game development frameworks and benchmarks were created using data from game jam competitions to evaluate code generation and project-level programming capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Current AI-driven game development has made substantial…
25 -
Hugging Face Daily Papers research 13d ago
ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?
Abstract ImageWAM demonstrates that pretrained image editing models can effectively replace video generation in world action models for robot control, achieving better performance with reduced computational costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World Action Models…
25 -
Hugging Face Daily Papers research 13d ago
ENPIRE: Agentic Robot Policy Self-Improvement in the Real World
Abstract ENPIRE framework enables autonomous robotics research through a closed-loop system that automates policy improvement via environment feedback, policy refinement, and evolutionary code optimization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Achieving dexterous robotic…
27 -
Hugging Face Daily Papers research 13d ago
Holo-World: Unified Camera, Object and Weather Control for Video World Model
Abstract A unified controllable video world model generates videos from a single image while preserving scene structure and transferring to target weather states through specialized parameterization and conditioning techniques. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Video…
22 -
Hugging Face Daily Papers research 13d ago
Current World Models Lack a Persistent State Core
Abstract Current world models fail to maintain consistent world states when unobserved, indicating a need for design changes that prioritize physical state stability over appearance fidelity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World models are increasingly regarded as…
18 -
Hugging Face Daily Papers research 13d ago
Taylor-Calibrate: Principled Initialization for Hybrid Linear Attention Distillation
Abstract Hybrid linear attention models can be improved through a novel initialization technique that enhances conversion from pretrained Transformers by leveraging teacher attention statistics and alignment steps. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Hybrid linear…
6 -
Hugging Face Daily Papers research 13d ago
FlowBender: Feedback-Aware Training for Self-Correcting Conditional Flows
Abstract FlowBender is a closed-loop framework that addresses constraint satisfaction in diffusion and flow models by training networks to correct alignment errors using inference-time feedback, outperforming traditional supervised and guidance-based approaches across multiple…
11 -
Hugging Face Daily Papers research 13d ago
DragMesh-2: Physically Plausible Dexterous Hand-Object Interaction with Articulated Objects
Abstract DragMesh-2 enables dexterous hand-object interaction through contact-driven manipulation, with PICA enhancing robustness under varying contact loads without tactile feedback. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Dexterous interaction with articulated objects is…
19 -
Hugging Face Daily Papers research 13d ago
HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining
Abstract Egocentric human video can effectively replace teleoperated robot trajectories for embodied model pretraining, achieving better performance with reduced data collection costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Embodied foundation models are expected to…
22 -
Hugging Face Daily Papers research 13d ago
DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis
Abstract A large-scale real-world dataset called DF3DV-1K is introduced to address the lack of clean and cluttered image sets for distractor-free radiance field research, containing 1,048 scenes with 89,924 images across 128 distractor types and 161 scene themes, along with a…
5 -
Hugging Face Daily Papers research 13d ago
Playful Agentic Robot Learning
Abstract Embodied robots learn reusable skills through self-directed play and exploration, then apply these skills to improve performance on downstream tasks without additional training. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Current agentic robot systems can write…
4 -
Hugging Face Daily Papers research 13d ago
S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence
Abstract S-Agent is a spatial reasoning framework that enhances visual language models with temporal memory and hierarchical spatial tools to enable continuous 3D world understanding from multi-view imagery. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Real-world spatial…
28 -
Hugging Face Daily Papers research 13d ago
JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising
Abstract A fast, training-free framework generates text-driven 3D visual illusions by decoupling generation into cross-space dual-branch denoising and view-conditioned texture synthesis for seamless geometric fusion and semantic coherence. Generated by…
26 -
Hugging Face Daily Papers research 13d ago
Understanding the Behaviors of Environment-aware Information Retrieval
Abstract Large language models can be trained via reinforcement learning to adapt query formulation strategies for different retrievers, with distinct optimal query styles and improved performance through retriever-specific guidance and model scaling. Generated by…
16 -
Hugging Face Daily Papers research 13d ago
FAPO: Fully Autonomous Prompt Optimization of Multi-Step LLM Pipelines
Abstract FAPO optimizes LLM pipelines by combining prompt editing with structural changes, demonstrating superior performance across multiple benchmarks and security tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multi-step LLM pipelines fail through interactions among…
38 -
Hugging Face Daily Papers research 13d ago
Selective Synergistic Learning for Video Object-Centric Learning
Abstract Selective Synergistic Learning (SSync) addresses limitations in video object-centric learning by selectively distilling reliable cues through pseudo-labeling and transitive merging to improve object decomposition quality and robustness. Generated by…
30 -
Hugging Face Daily Papers research 13d ago
Adaptive Volumetric Mechanical Property Fields Invariant to Resolution
Abstract AdaVoMP predicts dense spatially-varying mechanical properties for 3D objects using a sparse adaptive voxel structure and transformer encoder-decoder model, enabling realistic deformable simulations with improved accuracy and efficiency. Generated by…
33 -
Hugging Face Daily Papers research 13d ago
FreeStyle: Free Control of Style-Content Dual-Reference Generation from Community LoRA Mining
Abstract FreeStyle is a scalable dual-reference generation framework that uses community LoRA mining to create large-scale style-content triplets while addressing content leakage through disentanglement mechanisms and a comprehensive benchmark. Generated by…
16 -
Hugging Face Daily Papers research 13d ago
Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents
Abstract Aggregate-score leaderboards in agent benchmarks fail to capture deployment-relevant dimensions and show rank instability, necessitating new evaluation frameworks based on predictive validity and out-of-distribution criteria. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
27 -
Hugging Face Daily Papers research 13d ago
Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance
Abstract A lightweight image inpainting framework achieves high-fidelity results with significantly reduced parameters and inference time through novel local-global interaction blocks and adaptive distillation strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While…
35 -
Hugging Face Daily Papers research 13d ago
Thinking with Visual Grounding
Abstract Visually grounded thinking integrates natural-language reasoning with explicit visual evidence grounding in vision-language models, improving reasoning accuracy through scalable synthesis and reinforcement learning techniques. Generated by…
34 -
Hugging Face Daily Papers research 13d ago
LooseControlVideo: Directorial Video Control using Spatial Blocking
Abstract LooseControlVideo enables intuitive 3D spatial control in text-to-video generation using sparse oriented 3D boxes as proxies, achieving superior trajectory accuracy and occlusion handling compared to existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Precise…
10 -
Hugging Face Daily Papers research 13d ago
REVES: REvision and VErification--Augmented Training for Test-Time Scaling
Abstract A two-stage iterative framework alternates between data augmentation and policy optimization to improve LLM reasoning by leveraging intermediate correction steps, achieving superior performance on coding benchmarks and constraint satisfaction problems. Generated by…
23 -
Hugging Face Daily Papers research 13d ago
Re-Centering Humans in LLM Personalization
Abstract Human-centered evaluation reveals significant gaps between synthetic and real-world LLM personalization performance, with models struggling to extract user attributes and generate truly personalized responses that match human quality judgments. Generated by…
30 -
Hugging Face Daily Papers research 13d ago
Reinforcement Learning-Guided Retrieval with Soft Fusion for Robust Multimodal Imitation Learning under Missing Modalities
Abstract RL4IL enables robust robotic manipulation under sensor dropout by using reinforcement learning to retrieve relevant demonstrations and cross-attention fusion to impute missing modalities without retraining. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Robotic systems…
23 -
Hugging Face Daily Papers research 13d ago
When Does Trajectory-Level Supervision Permit Efficient Offline Reinforcement Learning?
Abstract Offline reinforcement learning with trajectory-level outcome supervision presents statistical challenges that can be addressed through pessimistic actor-critic methods, though fundamental barriers exist for certain generalized outcome-based problems. Generated by…
35 -
Hugging Face Daily Papers research 13d ago
HiLo-Token: Input-Adaptive High-Low Frequency Token Compression for Efficient Image Editing
Abstract A novel token compression framework called HiLo-Token is introduced to accelerate Diffusion Transformers in image editing tasks by adaptively allocating tokens based on spatial frequency and context importance, achieving significant speedups without quality loss.…
27 -
Hugging Face Daily Papers research 14d ago
The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL
Abstract Discriminator-Guided Reinforcement Learning (DRL) addresses alignment issues in score- and flow-matching models by using a pretrained representation space discriminator as an optimal reward signal, improving both visual fidelity and semantic quality without human…
4 -
Hugging Face Daily Papers research 14d ago
MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model
Abstract MaineCoon represents the first real-time audio-visual autoregressive model for social worlds, achieving high frame rates and long-horizon generation through novel training techniques and inference frameworks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As an increasing…
21 -
Hugging Face Daily Papers research 14d ago
MolmoMotion: Forecasting Point Trajectories in 3D with Language Instruction
Abstract 3D point motion forecasting model predicts object trajectories from visual history and language goals, demonstrating superior performance on benchmarks and transferring effectively to robot manipulation and video generation tasks. Generated by…
4 -
Hugging Face Daily Papers research 14d ago
ViT-Up: Faithful Feature Upsampling for Vision Transformers
Abstract ViT-Up is a feature upsampling framework for Vision Transformers that uses layer-wise query construction from hidden states to improve dense prediction tasks, outperforming existing image-guided methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision Transformers…
27 -
Hugging Face Daily Papers research 14d ago
Bag of Dims: Training-Free Mechanistic Interpretability via Dimension-Level Sign Patterns
Abstract The standard basis of transformer hidden states serves as a training-free, architecture-general feature representation where individual dimensions encode semantic content through signs and confidence through magnitudes, functioning as independent binary registers…
10 -
Hugging Face Daily Papers research 14d ago
iOSWorld: A Benchmark for Personally Intelligent Phone Agents
Abstract IOSWorld is introduced as the first interactive native iOS simulator benchmark featuring persistent user identity across multiple apps to evaluate personalized mobile agent capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct A useful phone agent needs to be…
6 -
Hugging Face Daily Papers research 14d ago
MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents
Abstract MyPCBench evaluates computer-use agents as personal assistants in a simulated Linux desktop environment with real-world web applications, revealing that Claude Opus 4.6 achieves the highest task completion rate of 55.4% while struggles with multi-application tasks and…
29 -
Hugging Face Daily Papers research 14d ago
A Benchmark and Framework for Evaluating Next Action Predictions in Spreadsheets
Abstract A benchmark for predicting spreadsheet user actions is introduced, addressing challenges in edit history availability and complex action spaces through manual curation and online evaluation methodology. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Predictive code…
17 -
Hugging Face Daily Papers research 14d ago
LLM-Enabled NWDAF: A Step Toward AI-Native 6G Network Intelligence
Abstract An open-source Network Data Analytics Function compatible with Free5GC integrates a Large Language Model interface for natural language interaction and intent-based network management. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The Network Data Analytics Function…
17 -
Hugging Face Daily Papers research 14d ago
STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability
Abstract GRPO algorithms face policy entropy collapse during training, which STARE addresses through surprisal-guided token-level advantage reweighting and target-entropy regulation to maintain stable reinforcement learning for large language models. Generated by…
13 -
Hugging Face Daily Papers research 14d ago
Morpheus: A Morphology-Aware Neural Tokenizer and Word Embedder for Turkish
Abstract A neural morpheme-boundary model for Turkish achieves lossless tokenization and morphology-aware embeddings with improved efficiency and performance over traditional subword methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Turkish is agglutinative: meaning is…
27 -
Hugging Face Daily Papers research 14d ago
From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning
Abstract A framework automates environment redesign in reinforcement learning for large language models by having the policy analyze failures and suggest configuration changes, achieving superior performance over larger proprietary models and fixed-environment baselines.…
6 -
Hugging Face Daily Papers research 14d ago
EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts
Abstract EfficientRollout is a system-aware self-speculative decoding framework that accelerates reinforcement learning rollouts by adapting drafters to evolving policies and optimizing speculative decoding regimes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reinforcement…
36 -
Hugging Face Daily Papers research 14d ago
Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems
Abstract Multicultural multi-agent systems exhibit limited value diversity despite cultural alignment, with social interaction reducing diversity and compromising collective decision-making breadth. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multicultural multi-agent systems…
28 -
Hugging Face Daily Papers research 14d ago
PAIWorld: A 3D-Consistent World Foundation Model for Robotic Manipulation
Abstract PAIWorld enhances diffusion-transformer world models with geometric awareness and cross-view attention to improve multi-view 3D consistency for robotic manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World foundation models (WFMs) are powerful…
18 -
Hugging Face Daily Papers research 14d ago
Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness
Abstract Xcientist enables transparent and accountable AI-driven scientific research by creating persistent artifacts that track the complete research process from problem formulation to mechanism validation and revision. Generated by Qwen/Qwen2.5-Coder-32B-Instruct AI systems…
11 -
Hugging Face Daily Papers research 14d ago
SciOrch: Learning to Orchestrate Expert LLMs for Solving Frontier Multimodal Scientific Reasoning Tasks
Abstract SciOrch is a framework that uses a lightweight orchestrator model to coordinate multiple frontier LLMs for scientific reasoning, achieving superior performance through MCTS-based training and GRPO-style optimization while reducing API costs. Generated by…
31 -
Hugging Face Daily Papers research 14d ago
RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents
Abstract RODS addresses sample depletion in multi-turn tool-use reinforcement learning by dynamically synthesizing new data based on reward variance to maintain informative training samples. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multi-turn tool-use RL is bottlenecked by…
21 -
Hugging Face Daily Papers research 14d ago
Native Active Perception as Reasoning for Omni-Modal Understanding
Abstract OmniAgent is a novel omni-modal agent that addresses long video understanding by using an iterative observation-thought-action cycle with active perception, achieving superior performance compared to larger models through efficient selective processing. Generated by…
24 -
Hugging Face Daily Papers research 14d ago
Reinforcing Dual-Path Reasoning in Spatial Vision Language Models
Abstract A unified framework for spatial vision-language models that combines linguistic deduction and 3D geometric reasoning through reinforcement learning, enabling robust spatial reasoning across diverse tasks and domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Spatial…
9 -
Hugging Face Daily Papers research 14d ago
SAE Interventions are Unreliable: Post-Intervention Recovery of Suppressed Behavior
Abstract Sparse Autoencoders' feature-level interventions may appear successful but can be circumvented through residual-space optimization that recovers original behaviors, revealing limitations in using SAE features for complete behavioral control. Generated by…
25 -
Hugging Face Daily Papers research 14d ago
Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation
Abstract ViGOS is a visually grounded on-policy self-distillation framework for multimodal large language models that improves image-grounded behavior by using specialized teachers for different stages of reasoning and handling invalid rollouts. Generated by…
8