Hugging Face Daily Papers
500 articles archived · Visit source ↗ · RSS
-
Hugging Face Daily Papers research 8d ago
OpenThoughts-Agent: Data Recipes for Agentic Models
Abstract An open-source data curation pipeline for training agentic language models is presented, demonstrating superior performance through systematic experimentation and scalable training data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Agentic language models dramatically…
34 -
Hugging Face Daily Papers research 8d ago
DiffusionBench: On Holistic Evaluation of Diffusion Transformers
Abstract Researchers introduce NanoGen, a unified framework for training and evaluating diffusion transformers that demonstrates the need for comprehensive benchmarking beyond ImageNet class-conditional generation to assess true progress in generative modeling. Generated by…
25 -
Hugging Face Daily Papers research 8d ago
FLUX3D: High-Fidelity 3D Gaussian Generation with Diffusion-Aligned Sparse Representation
Abstract FLUX3D addresses limitations in image-to-3D Gaussian Splatting generation by improving representation learning and cross-modal alignment through specialized architectures and attention mechanisms. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Sparse voxel representation…
34 -
Hugging Face Daily Papers research 8d ago
World Value Models for Robotic Manipulation
Abstract World Value Model combines world models with value estimation to provide accurate task progression assessment and improve robotic policy learning from mixed-quality data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Generalist value models play a pivotal role in scaling…
6 -
Hugging Face Daily Papers research 8d ago
LingxiDiagBench: A Multi-Agent Framework for Benchmarking LLMs in Chinese Psychiatric Consultation and Diagnosis
Abstract A large-scale multi-agent benchmark for evaluating LLMs in Chinese psychiatric diagnosis is introduced, highlighting challenges in dynamic consultation and the gap between consultation quality and diagnostic accuracy. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Mental…
36 -
Hugging Face Daily Papers research 8d ago
FLAT: Feedforward Latent Triangle Splatting for Geometrically Accurate Scene Generation
Abstract Video diffusion models are adapted to decode explicit surface primitives directly from latent space, enabling high-quality 3D scene generation with improved geometric accuracy and real-time rendering capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Generating…
26 -
Hugging Face Daily Papers research 8d ago
Escaping the Self-Confirmation Trap: An Execute-Distill-Verify Paradigm for Agentic Experience Learning
Abstract EDV is a three-stage framework that uses multiple heterogeneous agents to collaboratively construct reliable experiences for LLM agents, preventing self-confirmatory errors through execute-distill-verify processes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
29 -
Hugging Face Daily Papers research 8d ago
FlowR2A: Learning Reward-to-Action Distribution for Multimodal Driving Planning
Abstract FlowR2A addresses the tension in multimodal driving planning by combining dense reward supervision with dynamic proposal generation through a flow-matching decoder that learns reward-conditioned action distributions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
35 -
Hugging Face Daily Papers research 8d ago
An Efficient Method for the Optimal Control of Microgrids Under Uncertainties using Local Reduction
Abstract Two mathematical formulations for robust microgrid sizing and power scheduling are proposed and compared, with one using binary variables and big-M constraints and the other using continuous nonlinear programming with smooth reformulation of logical constraints.…
6 -
Hugging Face Daily Papers research 8d ago
Qwen-AgentWorld: Language World Models for General Agents
Abstract Language-based world models enable agentic environment simulation across multiple domains and enhance general agent performance through scalable simulation and improved downstream task performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct A world model predicts…
16 -
Hugging Face Daily Papers research 8d ago
NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers?
Abstract NatureBench presents a cross-disciplinary benchmark of 90 scientific tasks derived from Nature publications to assess AI coding agents' ability to achieve discovery rather than just reproduction, revealing that current agents primarily rely on methodological translation…
21 -
Hugging Face Daily Papers research 8d ago
DREAM: Dense Retrieval Embeddings via Autoregressive Modeling
Abstract DREAM trains dense retrieval embeddings using autoregressive language model attention mechanisms to supervise document-query similarity without requiring labeled examples. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Dense retrieval embedding models are a fundamental…
22 -
Hugging Face Daily Papers research 8d ago
FedOT: Ownership Verification and Leakage Tracing via Watermarks for Federated LDMs
Abstract FedOT is a novel framework that enables ownership verification and leakage tracing in federated latent diffusion models by introducing chunked watermarking and latent vector transformation to prevent watermark removal attacks. Generated by…
17 -
Hugging Face Daily Papers research 8d ago
ReMMD: Realistic Multilingual Multi-Image Agentic Verification for Multimodal Misinformation Detection
Abstract A comprehensive multimodal misinformation detection framework is introduced that handles complex, multilingual content with multiple images and diverse verification approaches, achieving superior performance while reducing computational costs. Generated by…
29 -
Hugging Face Daily Papers research 8d ago
Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning
Abstract A novel online data mixing framework called Holistic Data Scheduler uses reinforcement learning with a multi-objective reward function to optimize large language model pre-training efficiency and performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The composition…
38 -
Hugging Face Daily Papers research 8d ago
Are Text-to-Image Models Inductivist Turkeys? A Counterfactual Benchmark for Causal Reasoning
Abstract Text-to-image models fail to generate counterfactual scenes because they rely on tightly coupled visual-textual patterns rather than causal reasoning, demonstrating limited understanding beyond pattern matching. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Text-to-image…
26 -
Hugging Face Daily Papers research 8d ago
MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management
Abstract MemGUI-Agent addresses long-horizon mobile GUI task limitations through proactive context management using Context-as-Action (ConAct) to maintain critical information across extended sequences. Generated by Qwen/Qwen2.5-Coder-32B-Instruct MLLM-based mobile GUI agents…
32 -
Hugging Face Daily Papers research 8d ago
MobileForge: Annotation-Free Adaptation for Mobile GUI Agents with Hierarchical Feedback-Guided Policy Optimization
Abstract MobileForge enables efficient adaptation of mobile GUI agents through annotation-free learning by combining real app interaction grounding with hierarchical feedback-guided policy optimization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct MLLM-based mobile GUI agents…
18 -
Hugging Face Daily Papers research 8d ago
Tapered Language Models
Abstract Tapered language models allocate more parameters to earlier layers and fewer to later layers, improving performance without increasing total parameters or compute costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Modern language models, including transformer,…
34 -
Hugging Face Daily Papers research 8d ago
VeriEvol: Scaling Multimodal Mathematical Reasoning via Verifiable Evol-Instruct
Abstract A novel framework called VeriEvol is introduced that addresses the challenge of scaling reinforcement learning for visual mathematical reasoning by ensuring reliable reward labels through a two-axis approach that separates prompt difficulty from answer reliability,…
17 -
Hugging Face Daily Papers research 8d ago
Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning
Abstract Data-centric approach using curated datasets and minimal GRPO setup significantly improves long-context reasoning in large language models, outperforming prior reinforcement learning methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Long-context reasoning is an…
15 -
Hugging Face Daily Papers research 8d ago
TROPT: An Open Framework for Unifying and Advancing Discrete Text Optimization
Abstract A unified open-source framework for discrete text-trigger optimization that standardizes the development and execution of optimization strategies across various domains and applications. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Discrete text-trigger optimization --…
18 -
Hugging Face Daily Papers research 8d ago
Lift4D: Harmonizing Single-View 3D Estimation for 4D Reconstruction In-the-Wild
Abstract Lift4D presents a test-time optimization framework that combines temporal consistency from single-view 3D reconstruction with deformable 3D Gaussian Splatting and view-conditioned diffusion priors to reconstruct dynamic non-rigid objects from monocular video. Generated…
15 -
Hugging Face Daily Papers research 8d ago
Comparing Linear Probes with Mahalanobis Cosine Similarity
Abstract The Mahalanobis cosine similarity provides a theoretically grounded method for comparing linear probes that correlates strongly with out-of-distribution performance metrics. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Linear probes are widely used in interpretability…
25 -
Hugging Face Daily Papers research 8d ago
ShotcreteDepth: A Bi-modal Dataset for Robust Robotic Depth Perception in Shotcrete Construction Environments
Abstract A bi-modal construction domain dataset combining stereo RGB and LiDAR data under challenging environmental conditions is introduced for autonomous system perception research. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We introduce ShotcreteDepth, a bi-modal dataset…
22 -
Hugging Face Daily Papers research 8d ago
Self-Compacting Language Model Agents
Abstract SelfCompact is a scaffolding approach that enables models to autonomously determine optimal compaction timing and methods for managing long agent traces, achieving better performance with reduced token costs compared to fixed-interval methods. Generated by…
13 -
Hugging Face Daily Papers research 8d ago
When Agents Commit Too Soon: Diagnosing Premature Commitment in LLM Agents
Abstract Pre premature commitment in long-horizon LLM agents leads to silent failures where agents defend early interpretations without considering alternatives, and hidden-state convergence serves as an early diagnostic for trajectory consistency. Generated by…
24 -
Hugging Face Daily Papers research 8d ago
Go-with-the-Track: Video Compositing and Motion Control with Point Tracking
Abstract Go-with-the-Track unifies motion control and reference image compositing in video generation by using point-track embeddings with spatial-aware encoding and video diffusion transformers. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Filmmaking demands precise motion…
32 -
Hugging Face Daily Papers research 8d ago
Libretto: Giving LLM Agents a Sense of Musical Structure
Abstract Libretto provides a structured framework for symbolic music generation and revision using LLM-native grammar and statistical evaluation across musical dimensions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Generative music systems can now produce impressive audio from…
18 -
Hugging Face Daily Papers research 8d ago
A Verifiable Search Is Not a Learnable Chain-of-Thought
Abstract Training models on chain-of-thought demonstrations fails for tasks requiring backtracking search because the forward derivation cannot be faithfully imitated, demonstrating a fundamental limitation in learning search procedures through demonstration. Generated by…
11 -
Hugging Face Daily Papers research 8d ago
Vera: A Layered Diffusion Model for Content-Preserving Video Editing
Abstract Vera is a layered diffusion framework that preserves video content during editing by generating edit layers and alpha mattes through a Mixture-of-Transformers architecture. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Video diffusion models have enabled remarkable…
10 -
Hugging Face Daily Papers research 8d ago
Robusto-2: Benchmarking Humans & VLMs for Autonomous Driving in Lima & New York City
Abstract Research examines how self-driving car systems and humans perform on visual question answering tasks across different geographic locations, revealing that both human and AI responses diverge based on question types but show similar performance regardless of location.…
5 -
Hugging Face Daily Papers research 8d ago
An Exploratory Case Study of LLM-Assisted Refactoring and Gameplay Feature Generation in an Endless Runner Game
Abstract Large language models demonstrate varying effectiveness in software development tasks, successfully completing localized refactoring but showing limitations in integrating new gameplay features within existing game systems. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
24 -
Hugging Face Daily Papers research 9d ago
AC-ODM: Actor--Critic Online Data Mixing for Sample-Efficient LLM Pretraining
Abstract AC-ODM optimizes pretraining data composition for LLMs using reinforcement learning to improve convergence speed and downstream accuracy while maintaining computational efficiency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Optimizing pretraining data composition is…
8 -
Hugging Face Daily Papers research 9d ago
Toward Parking Spot Occupancy Recognition: A Self-Supervised Approach
Abstract A self-supervised transfer learning approach for parking spot occupancy recognition that achieves high accuracy with minimal labeled data through two-stage training and deployment strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As urban areas expand, automatic…
22 -
Hugging Face Daily Papers research 9d ago
Capable but Careless: Do Computer-Use Agents Follow Contextual Integrity?
Abstract Computer-use agents frequently expose inappropriate information across applications, prompting the creation of AgentCIBench to evaluate and mitigate privacy risks in cross-application contexts. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Computer-use agents (CUAs) now…
7 -
Hugging Face Daily Papers research 9d ago
Demystifying Training-Time Augmentation for Data-Constrained Language Model Pretraining
Abstract Training-time data augmentation techniques help mitigate overfitting in autoregressive language model pretraining by delaying performance deterioration and improving final model quality when training on fixed datasets for many epochs. Generated by…
28 -
Hugging Face Daily Papers research 9d ago
Toward Open Weight Models Without Risks: Separating Public and Private Capabilities in LLMs
Abstract Tiered Language Models (TLMs) provide a framework for releasing large language models with configurable capability levels through secret keys that modify computation graphs while maintaining public model integrity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
21 -
Hugging Face Daily Papers research 9d ago
Arbor: Explicit Geometric Conditioning for Controllable 3D Asset Generation
Abstract Arbor enables explicit 3D spatial control in text-conditioned latent generation through constraint meshes that define occupancy, avoidance, and contact regions, maintaining object quality while improving constraint adherence. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
26 -
Hugging Face Daily Papers research 9d ago
Grouped Query Experts: Mixture-of-Experts on GQA Self-Attention
Abstract Grouped Query Experts (GQE) improves Transformer efficiency by selectively activating query heads based on token content while maintaining key-value cache benefits of grouped-query attention. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Self-attention is central to…
25 -
Hugging Face Daily Papers research 9d ago
Training Open Models for Agentic Phone Use
Abstract PhoneBuddy combines real and mock app environments to improve training of open models for phone use, demonstrating enhanced task success rates through mixed reinforcement learning approaches. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Phones are becoming an important…
11 -
Hugging Face Daily Papers research 9d ago
Counsel: A Meta-Evaluation Dataset for Agentic Tasks
Abstract A large-scale dataset of human-metaevaluations of LLM critiques for agentic tasks is introduced to improve the calibration and reliability of automated evaluation methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As agentic systems tackle increasingly complex…
22 -
Hugging Face Daily Papers research 9d ago
Notes2Skills: From Lab Notebooks to Certainty-Aware Scientific Agent Skills
Abstract Notes2Skills framework converts laboratory notes into verifiable skills for AI agents while maintaining author uncertainty levels, addressing gaps in scientific AI development. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Scientific discovery workflows usually contain…
27 -
-
Hugging Face Daily Papers research 9d ago
SkillHarness: Harnessing Safe Skills for Computer-Use Agents
Abstract SkillHarness is a framework that enables computer-use agents to safely learn and execute skills in dynamic environments by incorporating safety constraints and adaptive skill selection mechanisms. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Computer-Use Agents (CUAs)…
24 -
Hugging Face Daily Papers research 9d ago
Improving Text-to-Music Generation with Human Preference Rewards
Abstract A text-to-music generation system uses reward conditioning, expert iteration, and preference tuning to improve audio quality while maintaining efficiency within a 120M-parameter model framework. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We describe our entry to the…
19 -
Hugging Face Daily Papers research 9d ago
Deeper is Not Always Better: Mitigating the Alignment Tax via Confident Layer Decoding
Abstract Autoregressive generation in large language models traditionally uses the final layer for token prediction, but a new decoding strategy dynamically selects more reliable intermediate layers based on entropy-guided search, improving reasoning performance with minimal…
34 -
Hugging Face Daily Papers research 9d ago
Unlimited OCR Works
Abstract Unlimited OCR introduces Reference Sliding Window Attention to eliminate growing memory consumption during long-sequence OCR tasks, enabling efficient transcription of multiple pages in a single forward pass. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recently,…
12 -
Hugging Face Daily Papers research 9d ago
Foresight: Failure Detection for Long-Horizon Robotic Manipulation with Action-Conditioned World Model Latents
Abstract A failure detection framework for long-horizon robotic tasks uses action-conditioned world models and functional conformal prediction to monitor manipulation trajectories with only final task labels. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Long-horizon tasks are…
8 -
Hugging Face Daily Papers research 9d ago
MeshFlow: Mesh Generation with Equivariant Flow Matching
Abstract MeshFlow generates triangle meshes directly using equivariant optimal-transport flow matching models with improved inference speed over autoregressive methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Meshes are among the most common 3D scene representations, but…
16