Hugging Face Daily Papers
500 articles archived · Visit source ↗ · RSS
-
Hugging Face Daily Papers research 21d ago
Redesign Mixture-of-Experts Routers with Manifold Power Iteration
Abstract Researchers propose a novel router redesign for Mixture-of-Experts models that aligns router rows with the principal singular directions of expert matrices using Manifold Power Iteration to improve model effectiveness. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Router…
33 -
Hugging Face Daily Papers research 21d ago
Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks
Abstract A new benchmark and adapter protocol called Claw-SWE-Bench enables fair comparison of diverse coding agents by standardizing evaluation conditions and revealing the importance of adapter design for effective code generation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
16 -
Hugging Face Daily Papers research 21d ago
ComBench: A Benchmark for Rigorous Proof Reasoning and Constructive Realization in Olympiad-Level Combinatorics
Abstract A new benchmark called ComBench is introduced to evaluate large language models' combinatorial reasoning abilities through Olympiad-level problems that test both proof construction and explicit mathematical constructions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
37 -
Hugging Face Daily Papers research 21d ago
Toward Generalist Autonomous Research via Hypothesis-Tree Refinement
Abstract An AI framework called Arbor enables autonomous scientific research by combining strategic coordination, isolated hypothesis testing, and a persistent knowledge tree to iteratively improve research outcomes across multiple domains. Generated by…
18 -
Hugging Face Daily Papers research 21d ago
TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders
Abstract TRL-Bench establishes a standardized benchmark for evaluating tabular representation learning models across multiple granularities, revealing that encoder performance varies by task type and requires capability-specific assessment rather than single leaderboard…
6 -
Hugging Face Daily Papers research 21d ago
Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization
Abstract Recursive automated composition framework enables scalable reinforcement learning for language models by automatically combining verifiable environments through compositional operators. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reinforcement Learning (RL) with…
11 -
Hugging Face Daily Papers research 21d ago
Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions
Abstract A teacher-student framework decouples complex reasoning from efficient reward deployment in text-to-image training, achieving superior preference accuracy and optimization performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reward models are central to…
22 -
Hugging Face Daily Papers research 21d ago
Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application
Abstract Large language model agents require specialized environments for training and evaluation, which can be categorized by their engineering lifecycle stages and evolved through various paradigms including neural and symbolic approaches. Generated by…
8 -
Hugging Face Daily Papers research 21d ago
Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models
Abstract Embodied-R1.5 is a unified embodied foundation model that integrates embodied reasoning capabilities and achieves state-of-the-art performance on embodied vision-language benchmarks through a multi-task balanced reinforcement learning approach. Generated by…
35 -
Hugging Face Daily Papers research 21d ago
InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning
Abstract InternVideo3 enhances long-horizon multimodal tasks through Multimodal Contextual Reasoning and efficient attention mechanisms, demonstrating strong performance on video understanding benchmarks and video agent capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
18 -
Hugging Face Daily Papers research 21d ago
i1: A Simple and Fully Open Recipe for Strong Text-to-Image Models
Abstract A comprehensive experimental study of text-to-image diffusion models reveals key design choices and training insights leading to the development of i1, a 3B-parameter model that matches leading performance while maintaining full openness. Generated by…
21 -
Hugging Face Daily Papers research 21d ago
World Model Self-Distillation: Training World Models to Solve General Tasks
Abstract A scalable framework combines self-distillation and reinforcement learning to transfer task-solving abilities from vision-language models to video diffusion models without requiring labeled task-video data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Pretrained video…
15 -
Hugging Face Daily Papers research 21d ago
Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling
Abstract Bebop addresses the efficiency bottleneck in reinforcement learning training of large language models by optimizing multi-token prediction techniques through entropy-aware sampling and novel training objectives that improve acceptance rates and inference throughput.…
28 -
Hugging Face Daily Papers research 21d ago
World Pilot: Steering Vision-Language-Action Models with World-Action Priors
Abstract World Pilot enhances Vision-Language-Action models by incorporating dynamic scene evolution and trajectory priors from a World-Action Model, achieving superior performance in zero-shot out-of-distribution manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
10 -
Hugging Face Daily Papers research 21d ago
Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay
Abstract Continual Instruction Tuning enables effective fine-tuning of large language models for low-resource language translation, achieving superior performance compared to standard instruction tuning and multilingual models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large…
4 -
Hugging Face Daily Papers research 21d ago
DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch
Abstract A large-scale dataset called DeNovoSWE is introduced for training code agents to generate entire software repositories from documentation, significantly improving performance on long-horizon software engineering tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As the…
15 -
Hugging Face Daily Papers research 21d ago
ICA Lens: Interpreting Language Models Without Training Another Dictionary
Abstract Independent component analysis (ICA) is revived as an efficient method for discovering interpretable directions in language model representations, offering a faster alternative to sparse autoencoder training while maintaining competitive performance in probing tasks.…
22 -
Hugging Face Daily Papers research 22d ago
PaperMentor: A Human-Centered Multi-Agent Writing Tutor for AI Research Papers on Overleaf
Abstract A human-centered writing assistant system called PaperMentor integrates expert research advice with specialized agents to provide actionable feedback during manuscript drafting, outperforming AI baselines in usability and relevance. Generated by…
38 -
Hugging Face Daily Papers research 22d ago
When Behavioral Safety Evaluation Fails: A Representation-Level Perspective
Abstract Behavioral safety evaluations of large language models provide incomplete insights into internal robustness, as demonstrated by the audit gap between observable outputs and latent space vulnerabilities revealed through intervention-based testing. Generated by…
38 -
Hugging Face Daily Papers research 22d ago
In-Context Multiple Instance Learning
Abstract Pretraining a Perceiver-style architecture on synthetic bag-structured data enables efficient, task-adaptive classification from few labeled examples in multiple instance learning scenarios. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multiple Instance Learning (MIL)…
10 -
Hugging Face Daily Papers research 22d ago
MilliVid: Hierarchical Latents for Long-Range Consistency in Video Generation
Abstract Video generative models achieve improved long-range consistency through coarse-to-fine token generation using a multi-scale autoencoder and diffusion model architecture. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Video generative models have become increasingly…
28 -
Hugging Face Daily Papers research 22d ago
Decentralized Multi-Agent Systems with Shared Context
Abstract Decentralized Language Models (DeLM) framework enables scalable large language model reasoning through parallel agents that asynchronously coordinate via a shared verified context, improving performance and efficiency over centralized approaches. Generated by…
25 -
Hugging Face Daily Papers research 22d ago
SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction
Abstract SkillHarm is a benchmark for evaluating skill-based attacks across the skill-use lifecycle, demonstrating significant vulnerabilities in current agents with attack success rates up to 86.3%. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Agent skills occupy a privileged…
36 -
Hugging Face Daily Papers research 22d ago
Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests
Abstract CapCode framework uses randomized testing with performance caps to detect and prevent shortcut exploitation in agent evaluation, while CapReward rewards systems that adhere to intended task specifications. Generated by Qwen/Qwen2.5-Coder-32B-Instruct A growing failure…
21 -
Hugging Face Daily Papers research 22d ago
The Role of Feedback Alignment in Self-Distillation
Abstract Self-distillation effectiveness depends on structural alignment between feedback and solver reasoning, with step-aligned critique outperforming binary rewards and reference solutions by targeting specific reasoning failures. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
32 -
Hugging Face Daily Papers research 22d ago
Next Forcing: Causal World Modeling with Multi-Chunk Prediction
Abstract Next Forcing introduces a multi-chunk prediction framework that accelerates training and inference for autoregressive video generation while improving accuracy and physical law adherence. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Autoregressive video generation has…
19 -
Hugging Face Daily Papers research 22d ago
FadeMem: Distance-Aware Memory Consolidation for Autoregressive Video Diffusion
Abstract FadeMem introduces a distance-aware key-value memory consolidation mechanism that organizes historical video data into a temporal hierarchy, improving long-video generation by preserving recent context and long-range anchors under fixed cache constraints. Generated by…
36 -
Hugging Face Daily Papers research 22d ago
Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning
Abstract CPPO addresses limitations in reinforcement learning with verifiable rewards by introducing position-weighted thresholds and cumulative prefix budgeting to better handle autoregressive generation challenges. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reinforcement…
12 -
Hugging Face Daily Papers research 22d ago
Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders
Abstract Sparse autoencoders trained on language model representations reveal interpretable features for speech synthesis that can be manipulated to control linguistic and prosodic attributes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Language models increasingly serve as the…
19 -
Hugging Face Daily Papers research 22d ago
Kwai Keye-VL-2.0 Technical Report
Abstract Kwai Keye-VL-2.0-30B-A3B is an open-source Mixture-of-Experts multimodal foundation model that enables long-video understanding and agentic intelligence through DeepSeek Sparse Attention and specialized training infrastructure. Generated by…
36 -
Hugging Face Daily Papers research 22d ago
IR3DE: A Linear Router for Large Language Models
Abstract A ridge regression-based routing method achieves competitive performance in selecting domain-expert LLMs for different tasks while enabling dynamic addition/removal of experts without retraining. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Foundational Large Language…
28 -
Hugging Face Daily Papers research 22d ago
PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models
Abstract A psychologically-informed refusal framework called PsychoSafe is developed for large language models to improve harmful request handling through structured supportive communication, showing enhanced refusal quality and resource referral while maintaining performance on…
14 -
Hugging Face Daily Papers research 22d ago
BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling
Abstract BrainSurgery is a tool for robust and reproducible tensor manipulation of neural network checkpoints through declarative YAML plans with built-in validation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As deep learning models scale, managing, inspecting, and modifying…
12 -
Hugging Face Daily Papers research 22d ago
UniPET: a universal network for high-quality PET image denoising across varied dose reduction factors
Abstract A universal PET image denoising framework addresses variability in dose reduction factors through domain generalization techniques and region-aware learning strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Most existing deep learning-based PET image denoising…
26 -
Hugging Face Daily Papers research 22d ago
U-TTT: Towards Generalizable PET Image Denoising via Test-Time Training
Abstract A novel U-shaped deep learning model with test-time training layers and dual-domain adaptation mechanisms achieves robust PET image denoising under distribution shifts. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing deep learning models for Positron Emission…
32 -
Hugging Face Daily Papers research 22d ago
Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution
Abstract Role-Agent framework enables LLM agents to function as both agent and environment through bootstrapped co-evolution, improving performance via environment-aware reasoning and targeted practice. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Although Large Language Model…
33 -
Hugging Face Daily Papers research 22d ago
Late-Layer Fusion is Enough: Dual-Path Vision Token Routing for Multimodal Large Language Models under Visual Saturation
Abstract Research reveals that vision and text tokens in multimodal models evolve asynchronously, leading to inefficient computation; a new asymmetric routing framework reduces visual processing overhead while maintaining performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
9 -
Hugging Face Daily Papers research 22d ago
MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism
Abstract MemDreamer addresses long-video understanding challenges by decoupling perception and reasoning through hierarchical graph memory and agentic exploration, achieving state-of-the-art performance with reduced computational overhead. Generated by…
33 -
Hugging Face Daily Papers research 22d ago
Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields
Abstract Current AI agents struggle with long-horizon professional GUI workflows, achieving low success rates due to issues with workflow consistency and domain-specific software understanding. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent years have witnessed the rapid…
15 -
Hugging Face Daily Papers research 22d ago
Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts
Abstract Retrospective Harness Optimization (RHO) is a self-supervised method that improves AI agent performance by optimizing agent harness using only past trajectories through diverse task selection, parallel re-solving, and self-validation techniques. Generated by…
8 -
Hugging Face Daily Papers research 22d ago
Lip Forcing: Few-Step Autoregressive Diffusion for Real-time Lip Synchronization
Abstract Autoregressive diffusion method for video-to-video lip synchronization achieves real-time performance through distillation and optimized inference schedules. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Diffusion-based lip synchronization models achieve strong visual…
29 -
Hugging Face Daily Papers research 22d ago
Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models
Abstract Flow-DPPO replaces ratio clipping with divergence proximal constraints in flow matching models, improving training stability and multi-objective optimization through exact KL divergence computation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent work has…
34 -
Hugging Face Daily Papers research 22d ago
Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories
Abstract A multi-agent framework automates data journalism by generating evidence-grounded, multimodal news stories while maintaining transparency and verifiability. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Data tells stories that shape society; the data journalist's job is…
10 -
Hugging Face Daily Papers research 22d ago
WorldOlympiad: Can Your World Model Survive a Triathlon?
Abstract WorldOlympiad presents a comprehensive benchmark for evaluating video-based world models across physical faithfulness, geometric consistency, and interaction fidelity, revealing significant gaps in current generative models' capabilities. Generated by…
13 -
Hugging Face Daily Papers research 22d ago
Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It
Abstract Chain-of-thought supervised fine-tuning degrades long-context recall in hybrid linear-attention models by biasing attention gradients toward short-range patterns, but a training-free method called QK-Restore can restore long-context capabilities by reverting query-key…
8 -
Hugging Face Daily Papers research 22d ago
Rethinking the Divergence Regularization in LLM RL
Abstract DRPO improves LLM reinforcement learning stability by replacing hard masks with smooth regularization that provides continuous gradient corrections beyond trust-region boundaries. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reinforcement learning (RL) has become a key…
29 -
Hugging Face Daily Papers research 22d ago
EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents
Abstract EEVEE is a novel test-time prompt learning framework for LLM agents that handles heterogeneous data streams through task clustering and co-evolving router-prompt optimization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct In this paper, we propose EEVEE, the first…
6 -
Hugging Face Daily Papers research 22d ago
SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research
Abstract A large language model trained on synthesized delegation intelligence achieves superior performance on long-horizon research tasks through task decomposition and subagent coordination. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models are increasingly…
12 -
Hugging Face Daily Papers research 22d ago
One Token per Multimodal Evidence: Latent Memory for Resource-Constrained QA
Abstract Latent Memory introduces a compressed representation approach for external memory in question answering, reducing token consumption and storage requirements while maintaining competitive performance across text-only and multimodal benchmarks. Generated by…
28 -
Hugging Face Daily Papers research 22d ago
Struct-Searcher: Agentic Structural Thinking Advances Multimodal Deep Information Seeking
Abstract Struct-Searcher introduces a belief revision theory-based structural agentic workflow for multimodal information seeking that improves accuracy over existing vision-language models and deep research agents. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Deep research…
17