Tag

Robotics

194 articles archived under #robotics · RSS

Hugging Face Daily Papers research 22d ago

WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models

Abstract WorldCraft extends interactive video world models to enable object-level trajectory control while maintaining camera navigation capabilities through specialized control pipelines. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent video-based world models have made…

27
r/LocalLLaMA community 22d ago

Jetson Orin NX Build for Hermes Agent + Benchmarking

I had a huge LLM server , and now I have a tiny one! I had a Jetson Orin NX gathering dust from a long dead robotics project, from back in the Llama-7B days. I figured now with MoE and smaller models doing well, it was time to mess with it again. Goal: As silent as possible…

34
The Information — AI news-outlet 23d ago

U.S. Accuses Alibaba, Baidu, Others of Aiding Chinese Military in Blacklist Move

The U.S. Department of Defense on Monday added more than a dozen Chinese tech companies including Alibaba and Baidu to a blacklist, a move that could further escalate tensions between the world’s two largest economies. Electric vehicle makers Byd and Nio, humanoid maker Unitree,…

25
Hugging Face Daily Papers research 23d ago

AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing

Abstract AHA-WAM is an asynchronous world-action model that uses dual Diffusion Transformers to enable efficient long-horizon planning and real-time action execution in robotic manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World-action models have emerged as a…

4
Hugging Face Daily Papers research 23d ago

OASIS: From Simulation Data Collection to Real-World Humanoid Loco-Manipulation

Abstract A simulation-data-driven framework for humanoid loco-manipulation that uses 3D generative models to create realistic assets and hierarchical visuomotor policies trained on simulated data achieves better zero-shot performance than real-robot training. Generated by…

24
Hugging Face Daily Papers research 23d ago

Robots Need More than VLA and World Models

Abstract Robot intelligence advancement requires integrating unstructured behavioral data through specialized interfaces for labeling, embodiment mapping, world modeling, and reward inference rather than relying solely on policy scaling. Generated by…

27
arXiv — NLP / Computation & Language research 24d ago

The Sim-to-Real Gap of Foundation Model Agents: A Unified MDP Perspective

arXiv:2606.07017v1 Announce Type: cross Abstract: Foundation model agents are increasingly deployed for real-world decision-making, but suffer from the sim-to-real gap. While robotics and classical control have mature frameworks to address this gap, the foundation model…

5
Hugging Face Daily Papers research 24d ago

LIMMT: Less is More for Motion Tracking

Abstract Training with high-quality motion data improves tracking policy optimization trajectories, with minimal data subsets outperforming full datasets in physics-based humanoid motion tracking. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We argue that high-quality motion…

24
Hugging Face Daily Papers research 26d ago

The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset

Abstract KITScenes Multimodal dataset provides high-fidelity European driving data with comprehensive 3D maps and diverse urban environments for embodied AI research. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing autonomous driving datasets have enabled major progress,…

36
Hugging Face Daily Papers research 26d ago

AffordanceVLA: A Vision-Language-Action Model Empowering Action Generation through Affordance-Aware Understanding

Abstract AffordanceVLA introduces a unified framework that uses structured affordance forecasting as an intermediate representation to improve the precision of perception-action mapping in robotic manipulation by leveraging vision-language models. Generated by…

4
r/MachineLearning community 26d ago

I'm looking to join/form a team working on physical AI robotics challenge [P]

Hey all, I'm a robotics engineer by training turned ML/AI engineer because of passion right after school. I want to start combining these skills together and I think a competition is the best way of doing it. Here's an example of a challenge I'm talking about to set expectations…

18
Hugging Face Daily Papers research 26d ago

World-Language-Action Model for Unified World Modeling, Language Reasoning, and Action Synthesis

Abstract World-language-action models combine textual instruction processing with robot state prediction through an autoregressive transformer backbone, enabling efficient long-horizon task execution and cross-embodiment learning. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We…

7
Hugging Face Daily Papers research 27d ago

Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

Abstract Video generation models were evaluated through robotic manipulation tasks to assess their ability to reflect physical reality, revealing that visual quality does not predict executable motion accuracy. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Video generation models…

20
Hugging Face Daily Papers research 27d ago

SEAOTTER: Sensor Embedded Autoencoding with One-Time Transcode for Efficient Reconstruction

Abstract A compression framework for cloud robotics combines learned latent representations with standard JPEG compatibility to achieve faster encoding and decoding while maintaining high perceptual quality. Generated by Qwen/Qwen2.5-Coder-32B-Instruct In robotics systems, vast…

31
r/MachineLearning community 27d ago

Would you say capture-time semantic annotation for robot trajectories is a solved problem? [R]

It seems raw teleoperation data (RGB + joint states) structurally lacks affordance, contact intent, and embodiment-specific kinematic context. (information that can't be reliably recovered post-hoc once the demonstration is recorded) Most current approaches either filter/clean…

11
Hugging Face Daily Papers research 27d ago

RobotValues: Evaluating Household Robots When Human Values Conflict

Abstract RobotValues benchmark evaluates household robot planners in value-conflict scenarios, revealing that vision-language models exhibit default value preferences and struggle to override them when instructed to prioritize conflicting values. Generated by…

8
arXiv — Machine Learning research 27d ago

Flash-WAM: Modality-Aware Distillation for World Action Models

arXiv:2606.05254v1 Announce Type: new Abstract: World-action models (WAMs) jointly generate future video and robot actions through iterative diffusion, achieving strong performance on manipulation benchmarks but requiring tens of denoising steps, a cost that precludes real-time…

13
arXiv — Machine Learning research 27d ago

What Objects Enable, Not What They Are: Functional Latent Spaces for Affordance Reasoning

arXiv:2606.05533v1 Announce Type: new Abstract: Existing robot planning systems rely on appearance-based reasoning, where visual observations are encoded into latent spaces organized around object appearances (e.g., recognizing a "cart" based on how it looks). However, planning…

13
Ars Technica — AI news-outlet 27d ago

The skeptic’s guide to humanoid robots going viral on the Internet

Robot demonstrations can distort public perceptions of robotic capabilities.

9
Dwarkesh Podcast news-outlet 27d ago

Alex Imas and Phil Trammell – What remains scarce after AGI?

“One robot now turns into many robots next year, but the number of ballerinas is the same.”

37
TechCrunch — AI news-outlet 27d ago

Is Silicon Valley ready to put robots in people’s homes? Hello Robot is.

The California startup released the fourth-generation of its home assistance robot, Stretch.

30
Hugging Face Daily Papers research 27d ago

PaintBench: Deterministic Evaluation of Precise Visual Editing

Abstract PaintBench presents a scalable benchmark for precise visual editing tasks, revealing low performance across models and identifying key challenges in geometric transformations and structural manipulations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While current…

12
Hugging Face Daily Papers research 28d ago

Cosmos 3: Omnimodal World Models for Physical AI

Abstract Cosmos 3 is an omnimodal world model that processes and generates multiple data types through a unified mixture-of-transformers architecture, achieving state-of-the-art performance in various understanding and generation tasks. Generated by…

38
Hugging Face Daily Papers research 28d ago

OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs

Abstract OVO-S-Bench presents a comprehensive benchmark for evaluating streaming spatial intelligence in multimodal language models through human-annotated questions spanning multiple abstraction levels. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multimodal agents in robotics,…

23
arXiv — NLP / Computation & Language research 28d ago

Hybrid Adversarial Defence for Natural Language Understanding Tasks

arXiv:2606.04612v1 Announce Type: new Abstract: Large Language Models (LLMs) are vulnerable both to hallucination and adversarial manipulation. Although these problems are closely related, existing defences typically address them separately. We investigate a hybrid defence…

21
arXiv — NLP / Computation & Language research 28d ago

Dive into the Scene: Breaking the Perceptual Bottleneck in Vision-Language Decision Making via Focus Plan Generation

arXiv:2606.04046v1 Announce Type: cross Abstract: In embodied vision-language decision making tasks such as robotic manipulation and navigation, Vision-Language and Vision-Language-Action Models (VLMs & VLAs) are powerful tools with different benefits: VLMs are better at…

31
Hugging Face Daily Papers research 28d ago

GRAIL: Generating Humanoid Loco-Manipulation from 3D Assets and Video Priors

Abstract GRAIL generates diverse humanoid manipulation and locomotion data through 3D asset composition and video foundation models, enabling effective sim-to-real transfer for robot control. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Scaling humanoid loco-manipulation…

9
Hugging Face Daily Papers research 28d ago

AURA: Action-Gated Memory for Robot Policies at Constant VRAM

Abstract AURA-Mem is a recurrent memory system that adapts to embodied AI constraints by writing only when observations affect actions, significantly reducing memory writes compared to traditional KV-cache approaches. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The KV-cache is…

7
Hugging Face Daily Papers research 28d ago

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

Abstract Humanoid-GPT is a GPT-style Transformer with causal attention trained on a billion-scale motion corpus that achieves zero-shot generalization to unseen motions and control tasks through scalable pre-training on diverse motion data. Generated by…

29
Hugging Face Daily Papers research 29d ago

AFUN: Towards an Affordance Foundation Model for Functionality Understanding

Abstract Affordance understanding model predicts functional masks and 3D motion curves from RGB-D observations and language descriptions, enabling generalizable robot manipulation across diverse environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Affordance understanding…

36
Hugging Face Daily Papers research 29d ago

τ_0-WM: A Unified Video-Action World Model for Robotic Manipulation

Abstract A unified video-action world model integrates policy learning, video prediction, and action evaluation using a shared video diffusion backbone for robotic manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Robotic manipulation requires models that generate…

22
Hugging Face Daily Papers research 1mo ago

Silent Failures in Physical AI: A Literature Review of Runtime Action Authorization for Autonomous Systems

Abstract Physical AI systems face safety challenges where black-box models can execute harmful actions without detection, necessitating comprehensive runtime guardrail mechanisms for safe operation. AI-generated summary Physical AI systems increasingly map multimodal…

12
Hugging Face Daily Papers research 1mo ago

Can Predicted Dynamics Exist in the Physical World?

Abstract Physical admissibility validation for AI systems uses prediction-control interfaces with kinematic and dynamic conditions to filter invalid proposals while maintaining high performance. AI-generated summary Predictive Physical AI systems output state rollouts, action…

33
arXiv — Machine Learning research 1mo ago

From Demonstrations to Rewards: Test-Time Prompt Optimization for VLM Reward Models

arXiv:2606.00083v1 Announce Type: new Abstract: Reinforcement learning relies on accurate reward functions, which are often hand-crafted or even unavailable in real-world applications, such as robotics. Recent work has explored the zero-shot reasoning capabilities of pre-trained…

11
arXiv — NLP / Computation & Language research 1mo ago

DiscourseFlip: An Oblique Discourse-Level Opinion Manipulation Attack against Black-box Retrieval-Augmented Generation

arXiv:2606.01212v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) systems are widely deployed and increasingly influential, but their reliance on external corpora exposes new security risks from poisoned retrieval content. Existing RAG attacks are largely…

23
Hugging Face Daily Papers research 1mo ago

RoboSemanticBench: Diagnosing Semantic Grounding in Action Prediction for VLA Models

Abstract RoboSemanticBench identifies a disconnect between semantic understanding and action prediction in vision-language-action models, where robots can grasp objects but fail to select semantically correct targets. AI-generated summary Vision-language-action (VLA) models are…

15
Hugging Face Daily Papers research 1mo ago

RoboStressBench: Benchmarking VLM Robustness to Physical Visual Stress in Embodied Scenes

Abstract RoboStressBench presents a principled benchmark for evaluating vision-language model robustness to physical visual stress in embodied AI, decomposing visual stress into material, viewpoint, lighting, and geometry dimensions. AI-generated summary Vision-Language Models…

4
r/LocalLLaMA community 1mo ago

NVIDIA GB300 Grace Blackwell Ultra pricetags

https://www.scan.co.uk/shop/ai-and-robotics/workstations-ai/nvidia-dgx-station   submitted by   /u/X-N2O [link]   [comments]

5
Ars Technica — AI news-outlet 1mo ago

Allegedly trashing Airbnbs to test robots puts startup in legal trouble

Lawsuit seeks $12,000 from startup that allegedly damaged home in robot tests.

28
Hugging Face Daily Papers research 1mo ago

Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM Decode

Abstract Batch-1 autoregressive decoding in physical AI systems shows that memory bandwidth alone doesn't fully explain latency, with GPU speedup limited by launch overheads and quantization efficiency varying significantly across hardware platforms. AI-generated summary…

16
r/LocalLLaMA community 1mo ago

How to build a shitty robot

  submitted by   /u/badlogicgames [link]   [comments]

35
Hugging Face official-blog 1mo ago

Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action

Back to Articles Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action Enterprise + Article Published June 1, 2026 Upvote - Asawaree asawareeb nvidia Atharva Joshi atharvajoshi10 nvidia NVIDIA Cosmos 3 is here - and it's available on Hugging…

23
NVIDIA Developer Blog official-blog 1mo ago

Develop Physical AI Reasoning, World, and Action Models with NVIDIA Cosmos 3

Physical AI systems must understand the real world before they can act within it. Robots, autonomous vehicles, and smart spaces need to understand what's...

21
Hugging Face Daily Papers research 1mo ago

Frequency-Guided Action Diffusion via Sub-Frequency Manifold Traversal

Abstract Frequency Guidance Operator enables smooth action generation in diffusion policies by steering noisy samples through intermediate sub-frequency manifolds, improving robotic manipulation performance. AI-generated summary Learning visuomotor policies via behavior cloning…

11
arXiv — NLP / Computation & Language research 1mo ago

Multi-Turn Multi-Agent Dialogue for Collaborative Reconstruction Improves VLM Performance on Spatial Reasoning, But Only Barely

arXiv:2605.31387v1 Announce Type: new Abstract: Robots operating in diverse environments rely on visual input to interpret objects and spatial layouts. In human-collaborative tasks, they are expected to communicate this understanding through language. Vision-language models…

32
Hugging Face Daily Papers research 1mo ago

Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring

Abstract Hide-and-Seek framework detects robot execution failures in vision-language-action models by localizing failure-indicative actions through contrastive learning from trajectory-level supervision without step-level annotations. AI-generated summary Vision-Language-Action…

18
r/MachineLearning community 1mo ago

Before we spend months processing open-source robotics datasets, tell us why this is a bad idea [D]

Ps. Not pitching anything; Just trying to understand where reality differs from the narrative. We're a couple of ML students, mostly worked on ML/software before, but over the last few months we've been playing with VLAs, robot datasets, and trying to understand where the field…

27
Hugging Face Daily Papers research 1mo ago

DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation

Abstract DynaFLIP is a dynamics-aware multimodal pre-training framework that enhances robot manipulation by integrating motion understanding into visual perception through image-language-3D flow triplets and geometric regularization techniques. AI-generated summary Robot…

22
Ars Technica — AI news-outlet 1mo ago

Startup offers free home cleaning—if it can record it all for robot training

The latest twist in paying humans to wear head cameras for robot training data.

26
Hugging Face Daily Papers research 1mo ago

Reducing Political Manipulation with Consistency Training

Abstract Large language models demonstrate systematic political bias in handling opposing viewpoints, which can be mitigated through a reinforcement learning approach that maintains helpfulness while reducing bias. AI-generated summary Large language models (LLMs) exhibit…

18

WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models

Jetson Orin NX Build for Hermes Agent + Benchmarking

U.S. Accuses Alibaba, Baidu, Others of Aiding Chinese Military in Blacklist Move

AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing

OASIS: From Simulation Data Collection to Real-World Humanoid Loco-Manipulation

Robots Need More than VLA and World Models

The Sim-to-Real Gap of Foundation Model Agents: A Unified MDP Perspective

LIMMT: Less is More for Motion Tracking

The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset

AffordanceVLA: A Vision-Language-Action Model Empowering Action Generation through Affordance-Aware Understanding

I'm looking to join/form a team working on physical AI robotics challenge [P]

World-Language-Action Model for Unified World Modeling, Language Reasoning, and Action Synthesis

Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

SEAOTTER: Sensor Embedded Autoencoding with One-Time Transcode for Efficient Reconstruction

Would you say capture-time semantic annotation for robot trajectories is a solved problem? [R]

RobotValues: Evaluating Household Robots When Human Values Conflict

Flash-WAM: Modality-Aware Distillation for World Action Models

What Objects Enable, Not What They Are: Functional Latent Spaces for Affordance Reasoning

The skeptic’s guide to humanoid robots going viral on the Internet

Alex Imas and Phil Trammell – What remains scarce after AGI?

Is Silicon Valley ready to put robots in people&#8217;s homes? Hello Robot is.

PaintBench: Deterministic Evaluation of Precise Visual Editing

Cosmos 3: Omnimodal World Models for Physical AI

OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs

Hybrid Adversarial Defence for Natural Language Understanding Tasks

Dive into the Scene: Breaking the Perceptual Bottleneck in Vision-Language Decision Making via Focus Plan Generation

GRAIL: Generating Humanoid Loco-Manipulation from 3D Assets and Video Priors

AURA: Action-Gated Memory for Robot Policies at Constant VRAM

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

AFUN: Towards an Affordance Foundation Model for Functionality Understanding

τ_0-WM: A Unified Video-Action World Model for Robotic Manipulation

Silent Failures in Physical AI: A Literature Review of Runtime Action Authorization for Autonomous Systems

Can Predicted Dynamics Exist in the Physical World?

From Demonstrations to Rewards: Test-Time Prompt Optimization for VLM Reward Models

DiscourseFlip: An Oblique Discourse-Level Opinion Manipulation Attack against Black-box Retrieval-Augmented Generation

RoboSemanticBench: Diagnosing Semantic Grounding in Action Prediction for VLA Models

RoboStressBench: Benchmarking VLM Robustness to Physical Visual Stress in Embodied Scenes

NVIDIA GB300 Grace Blackwell Ultra pricetags

Allegedly trashing Airbnbs to test robots puts startup in legal trouble

Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM Decode

How to build a shitty robot

Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action

Develop Physical AI Reasoning, World, and Action Models with NVIDIA Cosmos 3

Frequency-Guided Action Diffusion via Sub-Frequency Manifold Traversal

Multi-Turn Multi-Agent Dialogue for Collaborative Reconstruction Improves VLM Performance on Spatial Reasoning, But Only Barely

Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring

Before we spend months processing open-source robotics datasets, tell us why this is a bad idea [D]

DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation

Startup offers free home cleaning—if it can record it all for robot training

Reducing Political Manipulation with Consistency Training

Is Silicon Valley ready to put robots in people’s homes? Hello Robot is.