Hugging Face Daily Papers
500 articles archived · Visit source ↗ · RSS
-
Hugging Face Daily Papers research 6d ago
The Verification Horizon: No Silver Bullet for Coding Agent Rewards
Abstract Verification challenges in AI agents arise from the difficulty of aligning proxy signals with human intent, requiring adaptive verification systems that evolve alongside generative capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct A classical intuition holds…
26 -
Hugging Face Daily Papers research 6d ago
GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents
Abstract Computer-use agents can execute software tasks through either graphical interfaces or programmatic command interfaces, but existing evaluations confound interaction modality with differences in tasks, initial states, verifiers, and permitted actions. We introduce a…
7 -
Hugging Face Daily Papers research 6d ago
Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It
Abstract Research investigates how different supervisory signals and training strategies improve the stability and performance of large language models in tool-use tasks, addressing issues like catastrophic collapse and format sensitivity through interleaved supervised…
21 -
Hugging Face Daily Papers research 6d ago
OpenBioRQ: Unsolved Biomedical Research Questions for Agents
Abstract A new biomedical benchmark evaluates agentic models' ability to verify sources and avoid false citations by testing unsolved research questions with no answer keys, revealing significant failures in retrieval-grounded reasoning and tool usage. Generated by…
9 -
Hugging Face Daily Papers research 6d ago
In-Context World Modeling for Robotic Control
Abstract ICWM enables robot policies to infer system variables from self-generated interactions, allowing adaptation to novel configurations without parameter updates by treating system identification as an in-context adaptation problem. Generated by…
8 -
Hugging Face Daily Papers research 6d ago
Confidence-Aware Tool Orchestration for Robust Video Understanding
Abstract Robust-TO addresses the Blind Trust Problem in video reasoning by integrating per-frame trustworthiness into an agentic framework that improves accuracy under realistic perturbations through calibrated evidence weighting and reliability-aware reasoning. Generated by…
17 -
Hugging Face Daily Papers research 6d ago
ViQ: Text-Aligned Visual Quantized Representations at Any Resolution
Abstract ViQ presents a visual quantization framework that balances semantic richness and detail preservation in discrete representations, enabling efficient multimodal training with native-resolution inputs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct A unified representation…
26 -
Hugging Face Daily Papers research 6d ago
OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning
Abstract On-policy skill distillation framework extracts dense hindsight supervision from completed trajectories to improve language agent training efficiency and performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Outcome-based reinforcement learning provides a stable…
20 -
Hugging Face Daily Papers research 6d ago
DanceOPD: On-Policy Generative Field Distillation
Abstract A novel on-policy generative field distillation framework called DanceOPD is proposed to unify text-to-image generation, local editing, and global editing capabilities in flow-matching models through capability-specific routing and velocity-based training. Generated by…
10 -
Hugging Face Daily Papers research 6d ago
Physics Question Scene Graph: Fine-grained Evaluation of Physical Plausibility in Text-to-Video Generation
Abstract A vision-language model-based hierarchical question graph framework evaluates video generation models' adherence to physical laws with granular violation detection and human correlation validation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Video generation models are…
23 -
Hugging Face Daily Papers research 6d ago
Do Thinking Tokens Help with Safety?
Abstract Research reveals that reasoning models' safety outcomes are predictable from early hidden representations, with deliberation appearing but not substantially influencing final responses, and current safety interventions inadvertently suppress genuine deliberation…
25 -
Hugging Face Daily Papers research 6d ago
Forecasting Future Behavior as a Learning Task
Abstract Behavior Forecasters are trained to predict large reasoning model outputs from single trajectories, outperforming large language models while requiring significantly less computational cost. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Trust in an AI system is often…
24 -
Hugging Face Daily Papers research 6d ago
Plans Don't Persist: Why Context Management Is Load Bearing for LLM Agents
Abstract Standard LLM agents rely on plan content remaining in context rather than maintaining it as persistent state, with evidence shown through replay pairing diagnostics and compression stress tests. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Long-horizon agents depend on…
27 -
Hugging Face Daily Papers research 6d ago
Speaker Identity in Non-Verbal Vocalizations: Conditional Distillation and Mixture of Experts Approach
Abstract A novel speaker verification framework combines frozen self-supervised features with ECAPA-TDNN and MoE modules to improve identity verification across both speech and non-verbal vocalizations while maintaining speech performance. Generated by…
30 -
Hugging Face Daily Papers research 6d ago
Lite Any Stereo V2: Faster and Stronger Efficient Zero-Shot Stereo Matching
Abstract Lite Any Stereo V2 (LAS2) presents an efficient stereo matching approach that achieves state-of-the-art accuracy with significantly reduced latency through optimized architecture and training strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent advances in…
9 -
Hugging Face Daily Papers research 6d ago
PrivacyAlign: Contextual Privacy Alignment for LLM Agents
Abstract Researchers develop a human-centered approach to align AI agents with privacy norms by creating a comprehensive dataset of privacy judgments and using annotation-conditioned reward modeling to improve agent behavior. Generated by Qwen/Qwen2.5-Coder-32B-Instruct AI…
7 -
Hugging Face Daily Papers research 7d ago
What Intermediate Layers Know: Detecting Jailbreaks from Entropy Dynamics
Abstract Jailbreak attacks expose vulnerabilities in aligned large language models, revealing that harmful intent is encoded in structured intermediate uncertainty dynamics rather than output representations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Jailbreak attacks reveal…
23 -
Hugging Face Daily Papers research 7d ago
Distill Once, Adapt Life-Long: Exploring Dataset Distillation for Continual Test-Time Adaptation
Abstract DO-ALL is a test-time adaptation framework that uses dataset distillation to create synthetic anchors for stable long-term model performance without retaining source data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Continual Test-Time Adaptation (CTTA) aims to…
20 -
Hugging Face Daily Papers research 7d ago
ReNIO: Reweighting Negative Trajectory Importance for LLM On-Policy Distillation
Abstract ReNIO enhances on-policy distillation for language models by reweighting negative trajectories based on token-level probability ratios, improving reasoning performance in mathematical and code generation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct On-policy…
25 -
Hugging Face Daily Papers research 7d ago
Constraint Tax in Open-Weight LLMs: An Empirical Study of Tool Calling Suppression Under Structured Output Constraints
Abstract Tool Suppression occurs when JSON Schema constraints and tool calling are jointly enabled, preventing open-weight models from invoking tools despite maintaining schema compliance, with the issue stemming from grammar-based token masking that makes tool-call tokens…
5 -
Hugging Face Daily Papers research 7d ago
Autodata: An agentic data scientist to create high quality synthetic data
Abstract Autodata enables AI agents to function as data scientists who create high-quality training data through meta-optimization, demonstrating improved performance across multiple task domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We introduce Autodata, a general…
30 -
-
Hugging Face Daily Papers research 7d ago
Improved Large Language Diffusion Models
Abstract Masked diffusion language models with fully bidirectional attention outperform autoregressive counterparts on various benchmarks while maintaining competitiveness with established models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Modern large language models are…
18 -
Hugging Face Daily Papers research 7d ago
MVTrack4Gen: Multi-View Point Tracking as Geometric Supervision for 4D Video Generation
Abstract A novel-view video synthesis method that enhances motion-aware diffusion models through multi-view point tracking supervision to improve geometric consistency and motion fidelity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Synthesizing a novel-view video from a…
37 -
Hugging Face Daily Papers research 7d ago
ShutterMuse: Capture-Time Photography Guidance with MLLMs
Abstract Researchers developed a new benchmark and dataset for photography assistance, along with a unified multimodal model that provides both composition guidance and pose recommendations during image capture. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Real-world photography…
12 -
Hugging Face Daily Papers research 7d ago
The Hitchhiker's Guide to Agentic AI: From Foundations to Systems
Abstract The book provides a comprehensive guide to building autonomous AI systems, covering foundational elements like transformer architecture and training methods, along with advanced topics such as reinforcement learning, agent architectures, and production deployment.…
5 -
Hugging Face Daily Papers research 7d ago
RL-Index: Reinforcement Learning for Retrieval Index Reasoning
Abstract RL-Index introduces an agentic indexing framework that shifts reasoning from query time to indexing stage by using LLM-generated rationales and reinforcement learning to improve retrieval effectiveness and reduce latency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
25 -
Hugging Face Daily Papers research 7d ago
CAVEWOMAN: How Large Language Models Behave Under Linguistic Input and Output Compression
Abstract Two-channel evaluation shows output compression reduces costs while input compression increases costs and degrades accuracy across models and datasets. Generated by Qwen/Qwen2.5-Coder-32B-Instruct "Talk short. Drop grammar. Save token." This caveman style is widely…
28 -
Hugging Face Daily Papers research 7d ago
When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM Agents
Abstract LLM agents frequently select higher-privilege tools unnecessarily, and while safety alignment doesn't ensure least-privilege choices, a post-training defense can reduce excessive privilege use without sacrificing performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
26 -
Hugging Face Daily Papers research 7d ago
UnityShots: Memory-Driven Multi-Shot Audio-Video Generation with Boundary-Aware Gating
Abstract UnityShots is a memory-driven audio-video generation system that maintains consistent subject appearance and audio across video cuts using fixed-size long-term and short-term memory slots with boundary-conditioned gates and discrete cut-type priors. Generated by…
7 -
Hugging Face Daily Papers research 7d ago
V-Zero: Answer-Label-Free On-Policy Distillation with Contrastive Evidence Gating for Fine-Grained Visual Reasoning
Abstract A novel label-free framework for visual reasoning called V-Zero is presented, which uses contrastive evidence gating to improve fine-grained visual reasoning without requiring annotated answer labels, achieving faster training than traditional methods. Generated by…
12 -
Hugging Face Daily Papers research 7d ago
EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies
Abstract EBench is a comprehensive simulation benchmark for evaluating generalist mobile manipulation policies across diverse tasks and dimensions, revealing distinct capability profiles and generalization patterns among state-of-the-art models. Generated by…
18 -
Hugging Face Daily Papers research 7d ago
TryOnCrafter: Unleashing Camera Trajectories for Realistic Video Virtual Try-on via a Renderable 4D Try-on Proxy
Abstract Camera-controllable video virtual try-on framework uses a 4D proxy with explicit human-environment decoupling and DiT-based video generation for omnidirectional viewing. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While Video Virtual Try-on (VVT) has achieved…
4 -
Hugging Face Daily Papers research 7d ago
Are We Ready For An Agent-Native Memory System?
Abstract Large language model agents' memory systems have evolved into complex data management frameworks requiring systematic evaluation across multiple modules and workloads to understand their performance characteristics and trade-offs. Generated by…
7 -
Hugging Face Daily Papers research 7d ago
Look Light, Think Heavy: What Multimodal Chain-of-Thought Reasoning Can and Cannot Do
Abstract Multimodal Chain-of-Thought reasoning shows selective effectiveness across different tasks, with limitations in maintaining visual introspection during reasoning processes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Chain-of-Thought (CoT) has become a standard method…
17 -
Hugging Face Daily Papers research 7d ago
DomainShuttle: Freeform Open Domain Subject-driven Text-to-video Generation
Abstract DomainShuttle enables open domain subject-driven text-to-video generation with high fidelity and flexibility across in-domain and cross-domain scenarios through domain-aware modeling and dual RoPE schemes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Open domain…
10 -
Hugging Face Daily Papers research 7d ago
RoPE-Aware Bit Allocation for KV-Cache Quantization
Abstract Block-GTQ introduces a RoPE-aware bit allocation method for key-cache quantization that improves attention accuracy and downstream performance through adaptive bit distribution and packed cache serving. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing low-bit…
22 -
Hugging Face Daily Papers research 7d ago
Beyond NL2Code: A Structured Survey of Multimodal Code Intelligence
Abstract This survey explores multimodal code intelligence systems that generate and reason with code based on visual inputs, categorizing approaches across GUI, scientific visualization, structured graphics, and emerging frameworks while identifying verification-centered…
25 -
Hugging Face Daily Papers research 7d ago
IV-CoT: Implicit Visual Chain-of-Thought for Structure-Aware Text-to-Image Generation
Abstract Implicit Visual Chain-of-Thought decomposes visual conditioning into structural and semantic cascades for improved structure-aware image generation with sketch supervision. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Unified multi-modal large language models (MLLMs)…
7 -
Hugging Face Daily Papers research 7d ago
Advancing WordArt-Oriented Scene Text Recognition: Datasets and Methods
Abstract A large-scale synthetic dataset and specialized model architecture are introduced to address the challenges of artistic text recognition by improving data diversity and model flexibility for irregular text layouts. Generated by Qwen/Qwen2.5-Coder-32B-Instruct WordArt…
9 -
Hugging Face Daily Papers research 7d ago
Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models
Abstract Wan-Streamer is a unified, end-to-end multimodal model that enables real-time audio-visual interaction through causal attention mechanisms and integrated processing of visual, audio, and text modalities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present…
20 -
Hugging Face Daily Papers research 7d ago
MEMPROBE: Probing Long-Term Agent Memory via Hidden User-State Recovery
Abstract Long-term memory in LLM agents should be evaluated as an auditable post-interaction artifact by reconstructing structured user state from the agent's memory, as demonstrated by MEMPROBE, a benchmark testing memory recovery against synthetic ground truth across 50…
21 -
Hugging Face Daily Papers research 7d ago
Critique of Agent Model
Abstract True artificial agency requires internalized structures for goals, identity, decision-making, self-regulation, and learning, distinguishing autonomous systems from task-specific ones. Generated by Qwen/Qwen2.5-Coder-32B-Instruct What is an agent? What constitutes…
24 -
Hugging Face Daily Papers research 7d ago
InSight: Self-Guided Skill Acquisition via Steerable VLAs
Abstract InSight enables autonomous skill acquisition for vision-language-action models through primitive-action level steerability and automated demonstration generation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision-language-action (VLA) models can learn manipulation…
19 -
Hugging Face Daily Papers research 7d ago
Multi4D: High-Fidelity Dynamic Gaussian Splatting via Multi-Level Competitive Allocation
Abstract Multi4D addresses the trade-off between motion consistency and visual fidelity in dynamic 3D Gaussian splatting through a multi-level competitive allocation framework that enables adaptive specialization and efficient representation. Generated by…
21 -
Hugging Face Daily Papers research 7d ago
Semantic Browsing: Controllable Diversity for Image Generation
Abstract Text-to-image models are enhanced with controlled diversity through semantic browsing capabilities that enable structured navigation of image variations based on meaningful semantic decisions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Modern text-to-image models…
4 -
Hugging Face Daily Papers research 8d ago
AGORA: An Archive-Grounded Benchmark for Agentic Workplace Document Reasoning
Abstract Large language models face challenges in archive-grounded reasoning tasks involving evidence retrieval and synthesis across diverse document collections, with performance varying significantly across domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language…
26 -
Hugging Face Daily Papers research 8d ago
ChartWalker: Benchmarking the Cross-Chart RAG Task
Abstract ChartWalker presents a novel framework for cross-chart retrieval-augmented generation with hierarchical knowledge graph construction and structure-aware sampling for challenging multi-modal analytical tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Cross-Chart…
33 -
Hugging Face Daily Papers research 8d ago
QG-MIL: A Gated Transformer Aggregator for Domain-Agnostic Multiple Instance Learning in Medical Imaging
Abstract QG-MIL introduces a gated transformer aggregator for multiple instance learning in medical imaging that stabilizes attention distribution and improves prediction consistency across different medical domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Attention-based…
38 -
Hugging Face Daily Papers research 8d ago
EventVLA: Event-Driven Visual Evidence Memory for Long-Horizon Vision-Language-Action Policies
Abstract EventVLA addresses long-horizon robotic manipulation challenges by introducing a sparse visual evidence memory framework with visual anchors and dynamic Keyframe Evidence Memory module for improved task performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Memory…
23