Tag

Robotics

194 articles archived under #robotics · RSS

Hugging Face Daily Papers research 14d ago

Guava: An Effective and Universal Harness for Embodied Manipulation

Abstract A harness framework for embodied tool use combines high-level reasoning with external modules, enabling compact models to perform complex manipulation tasks with minimal training data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Language models trained on large-scale…

15
Hacker News — AI on Front Page community 14d ago

A robot is sprinting towards you. Do you want it running on Claude or Grok?

Article URL: https://openrouter.ai/blog/insights/royale-last-agent-standing/ Comments URL: https://news.ycombinator.com/item?id=48576824 Points: 244 # Comments: 189

25
Ars Technica — AI news-outlet 14d ago

AI coding agents taught robots how to install GPUs and cut zip-ties

NVIDIA’s self-improvement program for robots enlists teams of AI coding agents.

13
TechCrunch — AI news-outlet 14d ago

Collecting robot training data is dirty, unglamorous work. Some AI labs are already paying XDOF to do it

If physical AI is going to match the accomplishments of LLMs, there's a data problem that needs to be solved.

33
Hugging Face official-blog 14d ago

From the Hugging Face Hub to robot hardware with Strands Agents and LeRobot

Back to Articles a]:hidden"> From the Hugging Face Hub to robot hardware with Strands Agents and LeRobot Enterprise Article Published June 17, 2026 Upvote 4 Sundar Raghavan rsundaraws amazon Cagatay Cali cagataydev amazon A walkthrough of the LeRobot integration in Strands…

28
Hugging Face Daily Papers research 15d ago

Text-Vision Co-Instructed Image Editing

Abstract A unified text-visual image editing framework is presented that combines semantic intent from textual instructions with spatial guidance from visual prompts to achieve more precise and faithful image manipulation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing…

16
arXiv — NLP / Computation & Language research 15d ago

Incumbent Advantage: Brand Bias and Cognitive Manipulation Dynamics in LLM Recommendation Systems

arXiv:2606.17443v1 Announce Type: cross Abstract: Large language models (LLMs) are becoming a major way for consumers to find products, but we do not yet understand how brands compete in this new channel. We study brand dynamics in LLM recommendations using skincare products --…

11
MIT News — AI research 15d ago

Could AI tell you where you left your keys?

A new spatial memory system for robots efficiently captures details about the objects they see while exploring their environment.

17
Hugging Face Daily Papers research 15d ago

MotionVLA: Vision-Language-Action Model for Humanoid Motion

Abstract A dual-stream frequency tokenizer and autoregressive model are proposed to improve humanoid motion generation by separately encoding pose and physical dynamics, achieving better diversity and consistency compared to single-codebook approaches. Generated by…

11
Hugging Face Daily Papers research 15d ago

ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining

Abstract A unified Vision-Language-Action pretraining framework leverages heterogeneous data sources including human egocentric videos and robot trajectories through a reliability-aware training approach that improves performance on embodied AI tasks. Generated by…

6
r/MachineLearning community 15d ago

I built a leakage-clean verifier for robot manipulation, is this useful? Am I solving a non-problem? [D]

Spent the last few weeks on a benchmark/harness that tries to answer one question honestly: did a robot arm actually do the demonstrated task, or did the success metric just get fooled? The setup: compile a human demo into an object-centric graph (what changed in the world:…

7
Hugging Face Daily Papers research 15d ago

Human Universal Grasping

Abstract A flow-matching model generates diverse human grasps from RGB-D images, enabling zero-shot robotic grasping with improved performance over existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Humans can grasp objects effortlessly, whereas multi-fingered robots…

25
Hugging Face Daily Papers research 15d ago

LaWAM: Latent World Action Models for Efficient Dynamics-Aware Robot Policies

Abstract LaWAM enables efficient robot control by predicting compact latent visual subgoals instead of expensive video generation, achieving high performance with reduced computational latency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision-Language-Action models (VLAs)…

33
r/LocalLLaMA community 15d ago

Qwen Robot Suite

Looks pretty cool... https://qwen.ai/blog?id=qwen-robotsuite   submitted by   /u/Snoo_27681 [link]   [comments]

8
Hugging Face Daily Papers research 15d ago

Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation

Abstract Qwen-RobotWorld is a language-conditioned video world model that predicts future visual trajectories across multiple robotic domains using a double-stream diffusion transformer and embodied world knowledge corpus. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We…

5
Hugging Face Daily Papers research 16d ago

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

Abstract Hierarchical Advantage-Weighted Behavior Cloning (HABC) addresses sparse reward challenges in robot learning by separately optimizing viability and efficiency objectives through adaptive critic heads and intervention-aware credit assignment, significantly improving…

9
Hugging Face Daily Papers research 16d ago

Geometric Action Model for Robot Policy Learning

Abstract A geometric action model leverages pretrained geometric foundation models to enable language-conditioned manipulation policies with improved accuracy, robustness, and efficiency in 3D physical environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Generalist robot…

21
arXiv — Machine Learning research 16d ago

Phase-Localized Curation Does Not Help: A Negative Result on Per-Phase Metric Selection for Demonstration Filtering

arXiv:2606.15064v1 Announce Type: new Abstract: Manipulation demonstrations have temporal phase structure, and a natural hypothesis is that demonstration-curation metrics should be applied within phases rather than globally. The idea is to segment each trajectory into phases,…

10
arXiv — NLP / Computation & Language research 16d ago

Beyond English: Uncovering the Multilingual Gap in Vision-Language-Action Models

arXiv:2606.15714v1 Announce Type: new Abstract: Vision-Language-Action models have recently demonstrated promising capabilities in learning generalist robot policies from large-scale multimodal data. However, most existing VLA systems are trained and evaluated primarily with…

12
NVIDIA Developer Blog official-blog 16d ago

Pretrained to Imagine, Fine-Tuned to Act: The Rise of World-Action Models

Quick glossary for readers new to VLA/WAM terminology VLA Vision-Language-Action model: a robot policy that starts from a pretrained VLM backbone and adapts it...

22
arXiv — Machine Learning research 17d ago

SpikF-GO: Spiking Fourier Graph Operators for Multivariate Time Series Forecasting

arXiv:2606.13901v1 Announce Type: new Abstract: Spiking Neural Networks (SNNs) have emerged as an energy-efficient alternative to conventional neural networks, demonstrating strong performance in computer vision and robotics. More recently, SNNs have been applied to time series…

30
arXiv — Machine Learning research 17d ago

More with LESS -- Local Scene Representations for Tactile Imaging

arXiv:2606.14344v1 Announce Type: new Abstract: Tactile imaging seeks to reconstruct the internal structure of soft objects through touch sensing, with applications in medical diagnosis and robotic manipulation. Recent self-supervised learning approaches have shown promising…

33
arXiv — NLP / Computation & Language research 17d ago

Persuasion Index: A Theory-Guided Framework for Persuasion Analysis

arXiv:2606.14580v1 Announce Type: new Abstract: Identifying persuasive rhetorical cues is critical across domains, from detecting information manipulation and improving AI safety to advancing public health communication. We propose Persuasion Index (PI), a taxonomy of 15…

36
Ars Technica — AI news-outlet 19d ago

Here's what Jeff Bezos' new startup Prometheus will do

It isn't the only startup tackling physical AI, but it's one of the best-funded.

5
Ars Technica — AI news-outlet 19d ago

Ukraine's one-time test used fully autonomous drones to kill Russian soldiers

Full autonomy is rare, but Ukraine is installing AI modules on drones and robots.

32
Hugging Face Daily Papers research 19d ago

WEAVER, Better, Faster, Longer: An Effective World Model for Robotic Manipulation

Abstract WEAVER is a multi-view world model architecture that achieves high fidelity, consistency, and efficiency in robotic manipulation tasks through flow-matching loss and demonstrates superior performance in policy evaluation, improvement, and test-time planning. Generated…

27
Hugging Face Daily Papers research 19d ago

Revisiting Articulated Parts Perception in Robot Manipulation

Abstract A new geometric representation called Geometric Primary Structure (GPS) is introduced for articulated parts perception, enabling efficient data collection through VR annotation and achieving high manipulation success rates without fine-tuning. Generated by…

27
Hugging Face Daily Papers research 20d ago

LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories

Abstract LabVLA, a vision-language-action model trained with a two-stage approach combining action token pretraining and flow matching, demonstrates superior performance on laboratory automation tasks through simulated data generation and robot-specific learning. Generated by…

18
arXiv — NLP / Computation & Language research 20d ago

Does AI Reviewer See the Full Picture? Attacking and Defending Multimodal Peer Review

arXiv:2606.12716v1 Announce Type: new Abstract: The integration of Large Language Models (LLMs) and Multimodal LLMs (MLLMs) into scientific peer-review workflows introduces novel and significant risks for adversarial manipulation, especially given the multimodal nature of…

8
arXiv — NLP / Computation & Language research 20d ago

ComAct: Reframing Professional Software Manipulation via COM-as-Action Paradigm

arXiv:2606.13239v1 Announce Type: cross Abstract: Existing computer-use agents remain fundamentally limited in professional software manipulation: GUI-based agents suffer from fragile visual grounding and long-horizon error accumulation, while API-basedapproaches struggle with…

34
Hugging Face Daily Papers research 20d ago

MuJoCo-Drones-Gym: A GPU-Accelerated Multi-Drone Simulator for Control and Reinforcement Learning

Abstract A Gymnasium-compatible multi-drone simulation environment built on MuJoCo physics engine that supports flexible physics models, action interfaces, and observation spaces for reinforcement learning applications. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Robotic…

35
TechCrunch — AI news-outlet 20d ago

Theker just raised $85M to build the factory robot that doesn’t specialize in anything

Unlike humanoid robots designed around a fixed form — think Boston Dynamics — Theker's machines are built to be reconfigured.

18
TechCrunch — AI news-outlet 20d ago

Jeff Bezos’s Prometheus raises $12B to build an ‘artificial general engineer’ for the physical world

The new round values the physical AI startup that aims to automate heavy engineering and drug design at $41 billion.

31
r/LocalLLaMA community 20d ago

Refiner: Robotics library from the ex-Hugging Face pre-training team

ex-Huggingface pre-training team just announce a new library create for robotics data refinment! It supports ingestion of all robotics formats (Parquet, HDF5, MCAP, Zarr, RLDS, and LeRobot), as well as the common processing flows like visual hand-tracking, subtask annotations…

26
arXiv — Machine Learning research 21d ago

Implicit Neural Representations of Individual Behavior

arXiv:2606.12200v1 Announce Type: new Abstract: We study policy representation learning from unlabeled multi-policy behavioral data. Each episode is generated by a fixed policy, but policy labels are unavailable. This setting appears in robotics play, demonstrations, games,…

28
arXiv — Machine Learning research 21d ago

Fourier Features Let Agents Learn High Precision Policies with Imitation Learning

arXiv:2606.12334v1 Announce Type: new Abstract: High-precision robotic manipulation requires fine-grained spatial reasoning that is often difficult to achieve with RGB-only policies due to depth ambiguity and perspective scale issues. Policies that leverage 3D information…

14
arXiv — NLP / Computation & Language research 21d ago

Detecting AI-Generated Content on Social Media with Multi-modal Language Models

arXiv:2606.11200v1 Announce Type: new Abstract: Generative AI has enabled the creation of photorealistic images and videos that are increasingly disseminated on social media, often used for spam, misinformation, manipulation, and fraud. Existing AI-generated content (AIGC)…

36
arXiv — NLP / Computation & Language research 21d ago

When Does Language Matter? Multilingual Instructions Reveal Step-wise Language Sensitivity in Vision-Language-Action Models

arXiv:2606.11906v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models have shown strong performance in language-conditioned robotic manipulation, yet their robustness to linguistic variation remains poorly understood. In this work, we present the first systematic…

17
Hugging Face Daily Papers research 21d ago

World Pilot: Steering Vision-Language-Action Models with World-Action Priors

Abstract World Pilot enhances Vision-Language-Action models by incorporating dynamic scene evolution and trajectory priors from a World-Action Model, achieving superior performance in zero-shot out-of-distribution manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

10
Hugging Face Daily Papers research 22d ago

BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling

Abstract BrainSurgery is a tool for robust and reproducible tensor manipulation of neural network checkpoints through declarative YAML plans with built-in validation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As deep learning models scale, managing, inspecting, and modifying…

12
arXiv — Machine Learning research 22d ago

Co-GLANCE: Uncertainty-Aware Active Perception for Heterogeneous Robot Teaming

arXiv:2606.09919v1 Announce Type: new Abstract: Perceptual uncertainty is a central challenge for heterogeneous robot teams operating in unstructured outdoor environments, where no single viewpoint affords reliable scene understanding. Perceptual uncertainty, arising from…

36
arXiv — NLP / Computation & Language research 22d ago

TabClaw: An Interactive and Self-Evolving Agent for Spreadsheet Manipulation and Table Reasoning

arXiv:2606.10316v1 Announce Type: new Abstract: Spreadsheets and tables are widely used representations for structured data analysis, but effective analysis still requires substantial manual effort and domain expertise. Recent large language model (LLM) agents can automate parts…

31
arXiv — NLP / Computation & Language research 22d ago

Beyond APIs: Probing the Limits of MLLMs in Physical Tool Use

arXiv:2606.10803v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) excel at utilizing digital APIs and increasingly serve as the "brain" of embodied AI, instructing robots to interact with the physical world. In such embodied settings, a central capability…

38
Hugging Face Daily Papers research 22d ago

ABot-Earth 0.5: Generative 3D Earth Model

Abstract ABot-Earth 0.5 generates realistic 3D environments from satellite imagery using 3D Gaussian Splatting representation, enabling fast synthesis and real-time visualization for Embodied AI applications. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present ABot-Earth…

22
Hugging Face Daily Papers research 22d ago

VoLo: A Physical Orchestrator for Open-Vocabulary Long-Horizon Manipulation

Abstract VoLoAgent enables physical orchestration by integrating vision-language models with robot capabilities for open-vocabulary long-horizon manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Open-vocabulary long-horizon manipulation requires robots to reason…

24
The Information — AI news-outlet 22d ago

Kalshi Asks Some Customers For Employer Information

Prediction markets platform Kalshi is asking customers in some wagers to provide the name of their employer, industry and job function before making bets, to help the company crack down on potential insider trading. “For markets with heightened insider or manipulation risk, we…

23
TechCrunch — AI news-outlet 22d ago

Hey Siri, here’s what I actually want from AI

I'm desperate for a personal AI assistant, but do I really want to become the kind of person who can't function without the friendly robot voice in my phone?

4
Hugging Face Daily Papers research 22d ago

Robotic Policy Adaptation via Weight-Space Meta-Learning

Abstract WIZARD is a weight-space meta-learning framework that generates task-specific LoRA parameters for frozen VLA policies using language instructions and demonstration videos, enabling efficient task adaptation without fine-tuning. Generated by…

31
Hugging Face Daily Papers research 22d ago

Light-WAM: Efficient World Action Models with State-Fusion Action Decoding

Abstract Light-WAM is a lightweight world action model for robot manipulation that uses a compact video backbone and downsampled latent space for efficient future-video supervision, combined with a StateFusionActionExpert for direct action prediction. Generated by…

25
Google DeepMind official-blog 22d ago

Powering the future of robotics in Europe

Powering the future of robotics in Europe Jun 09, 2026 · Share x.com Facebook LinkedIn Mail Google DeepMind Accelerator selects 15 robotics companies from across Europe to join the program. Providing 3 months of intensive mentorship and technical support, enabling the…

22

Guava: An Effective and Universal Harness for Embodied Manipulation

A robot is sprinting towards you. Do you want it running on Claude or Grok?

AI coding agents taught robots how to install GPUs and cut zip-ties

Collecting robot training data is dirty, unglamorous work. Some AI labs are already paying XDOF to do it

From the Hugging Face Hub to robot hardware with Strands Agents and LeRobot

Text-Vision Co-Instructed Image Editing

Incumbent Advantage: Brand Bias and Cognitive Manipulation Dynamics in LLM Recommendation Systems

Could AI tell you where you left your keys?

MotionVLA: Vision-Language-Action Model for Humanoid Motion

ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining

I built a leakage-clean verifier for robot manipulation, is this useful? Am I solving a non-problem? [D]

Human Universal Grasping

LaWAM: Latent World Action Models for Efficient Dynamics-Aware Robot Policies

Qwen Robot Suite

Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

Geometric Action Model for Robot Policy Learning

Phase-Localized Curation Does Not Help: A Negative Result on Per-Phase Metric Selection for Demonstration Filtering

Beyond English: Uncovering the Multilingual Gap in Vision-Language-Action Models

Pretrained to Imagine, Fine-Tuned to Act: The Rise of World-Action Models

SpikF-GO: Spiking Fourier Graph Operators for Multivariate Time Series Forecasting

More with LESS -- Local Scene Representations for Tactile Imaging

Persuasion Index: A Theory-Guided Framework for Persuasion Analysis

Here&#039;s what Jeff Bezos&#039; new startup Prometheus will do

Ukraine&#039;s one-time test used fully autonomous drones to kill Russian soldiers

WEAVER, Better, Faster, Longer: An Effective World Model for Robotic Manipulation

Revisiting Articulated Parts Perception in Robot Manipulation

LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories

Does AI Reviewer See the Full Picture? Attacking and Defending Multimodal Peer Review

ComAct: Reframing Professional Software Manipulation via COM-as-Action Paradigm

MuJoCo-Drones-Gym: A GPU-Accelerated Multi-Drone Simulator for Control and Reinforcement Learning

Theker just raised $85M to build the factory robot that doesn&#8217;t specialize in anything

Jeff Bezos&#8217;s Prometheus raises $12B to build an &#8216;artificial general engineer&#8217; for the physical world

Refiner: Robotics library from the ex-Hugging Face pre-training team

Implicit Neural Representations of Individual Behavior

Fourier Features Let Agents Learn High Precision Policies with Imitation Learning

Detecting AI-Generated Content on Social Media with Multi-modal Language Models

When Does Language Matter? Multilingual Instructions Reveal Step-wise Language Sensitivity in Vision-Language-Action Models

World Pilot: Steering Vision-Language-Action Models with World-Action Priors

BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling

Co-GLANCE: Uncertainty-Aware Active Perception for Heterogeneous Robot Teaming

TabClaw: An Interactive and Self-Evolving Agent for Spreadsheet Manipulation and Table Reasoning

Beyond APIs: Probing the Limits of MLLMs in Physical Tool Use

ABot-Earth 0.5: Generative 3D Earth Model

VoLo: A Physical Orchestrator for Open-Vocabulary Long-Horizon Manipulation

Kalshi Asks Some Customers For Employer Information

Hey Siri, here&#8217;s what I actually want from AI

Robotic Policy Adaptation via Weight-Space Meta-Learning

Light-WAM: Efficient World Action Models with State-Fusion Action Decoding

Powering the future of robotics in Europe

Here's what Jeff Bezos' new startup Prometheus will do

Ukraine's one-time test used fully autonomous drones to kill Russian soldiers

Theker just raised $85M to build the factory robot that doesn’t specialize in anything

Jeff Bezos’s Prometheus raises $12B to build an ‘artificial general engineer’ for the physical world

Hey Siri, here’s what I actually want from AI