News / #robotics Tag Robotics 194 articles archived under #robotics · RSS Sign in to follow Hugging Face Daily Papers research 22d ago WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models Abstract WorldCraft extends interactive video world models to enable object-level trajectory control while maintaining camera navigation capabilities through specialized control pipelines. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent video-based world models have made… 27 r/LocalLLaMA community 22d ago Jetson Orin NX Build for Hermes Agent + Benchmarking I had a huge LLM server , and now I have a tiny one! I had a Jetson Orin NX gathering dust from a long dead robotics project, from back in the Llama-7B days. I figured now with MoE and smaller models doing well, it was time to mess with it again. Goal: As silent as possible… 34 The Information — AI news-outlet 23d ago U.S. Accuses Alibaba, Baidu, Others of Aiding Chinese Military in Blacklist Move The U.S. Department of Defense on Monday added more than a dozen Chinese tech companies including Alibaba and Baidu to a blacklist, a move that could further escalate tensions between the world’s two largest economies. Electric vehicle makers Byd and Nio, humanoid maker Unitree,… 25 Hugging Face Daily Papers research 23d ago AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing Abstract AHA-WAM is an asynchronous world-action model that uses dual Diffusion Transformers to enable efficient long-horizon planning and real-time action execution in robotic manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World-action models have emerged as a… 4 Hugging Face Daily Papers research 23d ago OASIS: From Simulation Data Collection to Real-World Humanoid Loco-Manipulation Abstract A simulation-data-driven framework for humanoid loco-manipulation that uses 3D generative models to create realistic assets and hierarchical visuomotor policies trained on simulated data achieves better zero-shot performance than real-robot training. Generated by… 24 Hugging Face Daily Papers research 23d ago Robots Need More than VLA and World Models Abstract Robot intelligence advancement requires integrating unstructured behavioral data through specialized interfaces for labeling, embodiment mapping, world modeling, and reward inference rather than relying solely on policy scaling. Generated by… 27 arXiv — NLP / Computation & Language research 24d ago The Sim-to-Real Gap of Foundation Model Agents: A Unified MDP Perspective arXiv:2606.07017v1 Announce Type: cross Abstract: Foundation model agents are increasingly deployed for real-world decision-making, but suffer from the sim-to-real gap. While robotics and classical control have mature frameworks to address this gap, the foundation model… 5 Hugging Face Daily Papers research 24d ago LIMMT: Less is More for Motion Tracking Abstract Training with high-quality motion data improves tracking policy optimization trajectories, with minimal data subsets outperforming full datasets in physics-based humanoid motion tracking. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We argue that high-quality motion… 24 Hugging Face Daily Papers research 26d ago The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset Abstract KITScenes Multimodal dataset provides high-fidelity European driving data with comprehensive 3D maps and diverse urban environments for embodied AI research. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing autonomous driving datasets have enabled major progress,… 36 Hugging Face Daily Papers research 26d ago AffordanceVLA: A Vision-Language-Action Model Empowering Action Generation through Affordance-Aware Understanding Abstract AffordanceVLA introduces a unified framework that uses structured affordance forecasting as an intermediate representation to improve the precision of perception-action mapping in robotic manipulation by leveraging vision-language models. Generated by… 4 r/MachineLearning community 26d ago I'm looking to join/form a team working on physical AI robotics challenge [P] Hey all, I'm a robotics engineer by training turned ML/AI engineer because of passion right after school. I want to start combining these skills together and I think a competition is the best way of doing it. Here's an example of a challenge I'm talking about to set expectations… 18 Hugging Face Daily Papers research 26d ago World-Language-Action Model for Unified World Modeling, Language Reasoning, and Action Synthesis Abstract World-language-action models combine textual instruction processing with robot state prediction through an autoregressive transformer backbone, enabling efficient long-horizon task execution and cross-embodiment learning. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We… 7 Hugging Face Daily Papers research 27d ago Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation? Abstract Video generation models were evaluated through robotic manipulation tasks to assess their ability to reflect physical reality, revealing that visual quality does not predict executable motion accuracy. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Video generation models… 20 Hugging Face Daily Papers research 27d ago SEAOTTER: Sensor Embedded Autoencoding with One-Time Transcode for Efficient Reconstruction Abstract A compression framework for cloud robotics combines learned latent representations with standard JPEG compatibility to achieve faster encoding and decoding while maintaining high perceptual quality. Generated by Qwen/Qwen2.5-Coder-32B-Instruct In robotics systems, vast… 31 r/MachineLearning community 27d ago Would you say capture-time semantic annotation for robot trajectories is a solved problem? [R] It seems raw teleoperation data (RGB + joint states) structurally lacks affordance, contact intent, and embodiment-specific kinematic context. (information that can't be reliably recovered post-hoc once the demonstration is recorded) Most current approaches either filter/clean… 11 Hugging Face Daily Papers research 27d ago RobotValues: Evaluating Household Robots When Human Values Conflict Abstract RobotValues benchmark evaluates household robot planners in value-conflict scenarios, revealing that vision-language models exhibit default value preferences and struggle to override them when instructed to prioritize conflicting values. Generated by… 8 arXiv — Machine Learning research 27d ago Flash-WAM: Modality-Aware Distillation for World Action Models arXiv:2606.05254v1 Announce Type: new Abstract: World-action models (WAMs) jointly generate future video and robot actions through iterative diffusion, achieving strong performance on manipulation benchmarks but requiring tens of denoising steps, a cost that precludes real-time… 13 arXiv — Machine Learning research 27d ago What Objects Enable, Not What They Are: Functional Latent Spaces for Affordance Reasoning arXiv:2606.05533v1 Announce Type: new Abstract: Existing robot planning systems rely on appearance-based reasoning, where visual observations are encoded into latent spaces organized around object appearances (e.g., recognizing a "cart" based on how it looks). However, planning… 13 Ars Technica — AI news-outlet 27d ago The skeptic’s guide to humanoid robots going viral on the Internet Robot demonstrations can distort public perceptions of robotic capabilities. 9 Dwarkesh Podcast news-outlet 27d ago Alex Imas and Phil Trammell – What remains scarce after AGI? “One robot now turns into many robots next year, but the number of ballerinas is the same.” 37 TechCrunch — AI news-outlet 27d ago Is Silicon Valley ready to put robots in people’s homes? Hello Robot is. The California startup released the fourth-generation of its home assistance robot, Stretch. 30 Hugging Face Daily Papers research 27d ago PaintBench: Deterministic Evaluation of Precise Visual Editing Abstract PaintBench presents a scalable benchmark for precise visual editing tasks, revealing low performance across models and identifying key challenges in geometric transformations and structural manipulations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While current… 12 Hugging Face Daily Papers research 28d ago Cosmos 3: Omnimodal World Models for Physical AI Abstract Cosmos 3 is an omnimodal world model that processes and generates multiple data types through a unified mixture-of-transformers architecture, achieving state-of-the-art performance in various understanding and generation tasks. Generated by… 38 Hugging Face Daily Papers research 28d ago OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs Abstract OVO-S-Bench presents a comprehensive benchmark for evaluating streaming spatial intelligence in multimodal language models through human-annotated questions spanning multiple abstraction levels. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multimodal agents in robotics,… 23 arXiv — NLP / Computation & Language research 28d ago Hybrid Adversarial Defence for Natural Language Understanding Tasks arXiv:2606.04612v1 Announce Type: new Abstract: Large Language Models (LLMs) are vulnerable both to hallucination and adversarial manipulation. Although these problems are closely related, existing defences typically address them separately. We investigate a hybrid defence… 21 arXiv — NLP / Computation & Language research 28d ago Dive into the Scene: Breaking the Perceptual Bottleneck in Vision-Language Decision Making via Focus Plan Generation arXiv:2606.04046v1 Announce Type: cross Abstract: In embodied vision-language decision making tasks such as robotic manipulation and navigation, Vision-Language and Vision-Language-Action Models (VLMs & VLAs) are powerful tools with different benefits: VLMs are better at… 31 Hugging Face Daily Papers research 28d ago GRAIL: Generating Humanoid Loco-Manipulation from 3D Assets and Video Priors Abstract GRAIL generates diverse humanoid manipulation and locomotion data through 3D asset composition and video foundation models, enabling effective sim-to-real transfer for robot control. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Scaling humanoid loco-manipulation… 9 Hugging Face Daily Papers research 28d ago AURA: Action-Gated Memory for Robot Policies at Constant VRAM Abstract AURA-Mem is a recurrent memory system that adapts to embodied AI constraints by writing only when observations affect actions, significantly reducing memory writes compared to traditional KV-cache approaches. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The KV-cache is… 7 Hugging Face Daily Papers research 28d ago Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking Abstract Humanoid-GPT is a GPT-style Transformer with causal attention trained on a billion-scale motion corpus that achieves zero-shot generalization to unseen motions and control tasks through scalable pre-training on diverse motion data. Generated by… 29 Hugging Face Daily Papers research 29d ago AFUN: Towards an Affordance Foundation Model for Functionality Understanding Abstract Affordance understanding model predicts functional masks and 3D motion curves from RGB-D observations and language descriptions, enabling generalizable robot manipulation across diverse environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Affordance understanding… 36 Hugging Face Daily Papers research 29d ago τ_0-WM: A Unified Video-Action World Model for Robotic Manipulation Abstract A unified video-action world model integrates policy learning, video prediction, and action evaluation using a shared video diffusion backbone for robotic manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Robotic manipulation requires models that generate… 22 Hugging Face Daily Papers research 1mo ago Silent Failures in Physical AI: A Literature Review of Runtime Action Authorization for Autonomous Systems Abstract Physical AI systems face safety challenges where black-box models can execute harmful actions without detection, necessitating comprehensive runtime guardrail mechanisms for safe operation. AI-generated summary Physical AI systems increasingly map multimodal… 12 Hugging Face Daily Papers research 1mo ago Can Predicted Dynamics Exist in the Physical World? Abstract Physical admissibility validation for AI systems uses prediction-control interfaces with kinematic and dynamic conditions to filter invalid proposals while maintaining high performance. AI-generated summary Predictive Physical AI systems output state rollouts, action… 33 arXiv — Machine Learning research 1mo ago From Demonstrations to Rewards: Test-Time Prompt Optimization for VLM Reward Models arXiv:2606.00083v1 Announce Type: new Abstract: Reinforcement learning relies on accurate reward functions, which are often hand-crafted or even unavailable in real-world applications, such as robotics. Recent work has explored the zero-shot reasoning capabilities of pre-trained… 11 arXiv — NLP / Computation & Language research 1mo ago DiscourseFlip: An Oblique Discourse-Level Opinion Manipulation Attack against Black-box Retrieval-Augmented Generation arXiv:2606.01212v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) systems are widely deployed and increasingly influential, but their reliance on external corpora exposes new security risks from poisoned retrieval content. Existing RAG attacks are largely… 23 Hugging Face Daily Papers research 1mo ago RoboSemanticBench: Diagnosing Semantic Grounding in Action Prediction for VLA Models Abstract RoboSemanticBench identifies a disconnect between semantic understanding and action prediction in vision-language-action models, where robots can grasp objects but fail to select semantically correct targets. AI-generated summary Vision-language-action (VLA) models are… 15 Hugging Face Daily Papers research 1mo ago RoboStressBench: Benchmarking VLM Robustness to Physical Visual Stress in Embodied Scenes Abstract RoboStressBench presents a principled benchmark for evaluating vision-language model robustness to physical visual stress in embodied AI, decomposing visual stress into material, viewpoint, lighting, and geometry dimensions. AI-generated summary Vision-Language Models… 4 r/LocalLLaMA community 1mo ago NVIDIA GB300 Grace Blackwell Ultra pricetags https://www.scan.co.uk/shop/ai-and-robotics/workstations-ai/nvidia-dgx-station   submitted by   /u/X-N2O [link]   [comments] 5 Ars Technica — AI news-outlet 1mo ago Allegedly trashing Airbnbs to test robots puts startup in legal trouble Lawsuit seeks $12,000 from startup that allegedly damaged home in robot tests. 28 Hugging Face Daily Papers research 1mo ago Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM Decode Abstract Batch-1 autoregressive decoding in physical AI systems shows that memory bandwidth alone doesn't fully explain latency, with GPU speedup limited by launch overheads and quantization efficiency varying significantly across hardware platforms. AI-generated summary… 16 r/LocalLLaMA community 1mo ago How to build a shitty robot   submitted by   /u/badlogicgames [link]   [comments] 35 Hugging Face official-blog 1mo ago Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action Back to Articles Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action Enterprise + Article Published June 1, 2026 Upvote - Asawaree asawareeb nvidia Atharva Joshi atharvajoshi10 nvidia NVIDIA Cosmos 3 is here - and it's available on Hugging… 23 NVIDIA Developer Blog official-blog 1mo ago Develop Physical AI Reasoning, World, and Action Models with NVIDIA Cosmos 3 Physical AI systems must understand the real world before they can act within it. Robots, autonomous vehicles, and smart spaces need to understand what's... 21 Hugging Face Daily Papers research 1mo ago Frequency-Guided Action Diffusion via Sub-Frequency Manifold Traversal Abstract Frequency Guidance Operator enables smooth action generation in diffusion policies by steering noisy samples through intermediate sub-frequency manifolds, improving robotic manipulation performance. AI-generated summary Learning visuomotor policies via behavior cloning… 11 arXiv — NLP / Computation & Language research 1mo ago Multi-Turn Multi-Agent Dialogue for Collaborative Reconstruction Improves VLM Performance on Spatial Reasoning, But Only Barely arXiv:2605.31387v1 Announce Type: new Abstract: Robots operating in diverse environments rely on visual input to interpret objects and spatial layouts. In human-collaborative tasks, they are expected to communicate this understanding through language. Vision-language models… 32 Hugging Face Daily Papers research 1mo ago Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring Abstract Hide-and-Seek framework detects robot execution failures in vision-language-action models by localizing failure-indicative actions through contrastive learning from trajectory-level supervision without step-level annotations. AI-generated summary Vision-Language-Action… 18 r/MachineLearning community 1mo ago Before we spend months processing open-source robotics datasets, tell us why this is a bad idea [D] Ps. Not pitching anything; Just trying to understand where reality differs from the narrative. We're a couple of ML students, mostly worked on ML/software before, but over the last few months we've been playing with VLAs, robot datasets, and trying to understand where the field… 27 Hugging Face Daily Papers research 1mo ago DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation Abstract DynaFLIP is a dynamics-aware multimodal pre-training framework that enhances robot manipulation by integrating motion understanding into visual perception through image-language-3D flow triplets and geometric regularization techniques. AI-generated summary Robot… 22 Ars Technica — AI news-outlet 1mo ago Startup offers free home cleaning—if it can record it all for robot training The latest twist in paying humans to wear head cameras for robot training data. 26 Hugging Face Daily Papers research 1mo ago Reducing Political Manipulation with Consistency Training Abstract Large language models demonstrate systematic political bias in handling opposing viewpoints, which can be mitigated through a reinforcement learning approach that maintains helpfulness while reducing bias. AI-generated summary Large language models (LLMs) exhibit… 18 Page 3 of 4 · 194 articles ← Newer Older →