News / #robotics Tag Robotics 194 articles archived under #robotics · RSS Sign in to follow Hugging Face Daily Papers research 14d ago Guava: An Effective and Universal Harness for Embodied Manipulation Abstract A harness framework for embodied tool use combines high-level reasoning with external modules, enabling compact models to perform complex manipulation tasks with minimal training data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Language models trained on large-scale… 15 Hacker News — AI on Front Page community 14d ago A robot is sprinting towards you. Do you want it running on Claude or Grok? Article URL: https://openrouter.ai/blog/insights/royale-last-agent-standing/ Comments URL: https://news.ycombinator.com/item?id=48576824 Points: 244 # Comments: 189 25 Ars Technica — AI news-outlet 14d ago AI coding agents taught robots how to install GPUs and cut zip-ties NVIDIA’s self-improvement program for robots enlists teams of AI coding agents. 13 TechCrunch — AI news-outlet 14d ago Collecting robot training data is dirty, unglamorous work. Some AI labs are already paying XDOF to do it If physical AI is going to match the accomplishments of LLMs, there's a data problem that needs to be solved. 33 Hugging Face official-blog 14d ago From the Hugging Face Hub to robot hardware with Strands Agents and LeRobot Back to Articles a]:hidden"> From the Hugging Face Hub to robot hardware with Strands Agents and LeRobot Enterprise Article Published June 17, 2026 Upvote 4 Sundar Raghavan rsundaraws amazon Cagatay Cali cagataydev amazon A walkthrough of the LeRobot integration in Strands… 28 Hugging Face Daily Papers research 15d ago Text-Vision Co-Instructed Image Editing Abstract A unified text-visual image editing framework is presented that combines semantic intent from textual instructions with spatial guidance from visual prompts to achieve more precise and faithful image manipulation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing… 16 arXiv — NLP / Computation & Language research 15d ago Incumbent Advantage: Brand Bias and Cognitive Manipulation Dynamics in LLM Recommendation Systems arXiv:2606.17443v1 Announce Type: cross Abstract: Large language models (LLMs) are becoming a major way for consumers to find products, but we do not yet understand how brands compete in this new channel. We study brand dynamics in LLM recommendations using skincare products --… 11 MIT News — AI research 15d ago Could AI tell you where you left your keys? A new spatial memory system for robots efficiently captures details about the objects they see while exploring their environment. 17 Hugging Face Daily Papers research 15d ago MotionVLA: Vision-Language-Action Model for Humanoid Motion Abstract A dual-stream frequency tokenizer and autoregressive model are proposed to improve humanoid motion generation by separately encoding pose and physical dynamics, achieving better diversity and consistency compared to single-codebook approaches. Generated by… 11 Hugging Face Daily Papers research 15d ago ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining Abstract A unified Vision-Language-Action pretraining framework leverages heterogeneous data sources including human egocentric videos and robot trajectories through a reliability-aware training approach that improves performance on embodied AI tasks. Generated by… 6 r/MachineLearning community 15d ago I built a leakage-clean verifier for robot manipulation, is this useful? Am I solving a non-problem? [D] Spent the last few weeks on a benchmark/harness that tries to answer one question honestly: did a robot arm actually do the demonstrated task, or did the success metric just get fooled? The setup: compile a human demo into an object-centric graph (what changed in the world:… 7 Hugging Face Daily Papers research 15d ago Human Universal Grasping Abstract A flow-matching model generates diverse human grasps from RGB-D images, enabling zero-shot robotic grasping with improved performance over existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Humans can grasp objects effortlessly, whereas multi-fingered robots… 25 Hugging Face Daily Papers research 15d ago LaWAM: Latent World Action Models for Efficient Dynamics-Aware Robot Policies Abstract LaWAM enables efficient robot control by predicting compact latent visual subgoals instead of expensive video generation, achieving high performance with reduced computational latency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision-Language-Action models (VLAs)… 33 r/LocalLLaMA community 15d ago Qwen Robot Suite Looks pretty cool... https://qwen.ai/blog?id=qwen-robotsuite   submitted by   /u/Snoo_27681 [link]   [comments] 8 Hugging Face Daily Papers research 15d ago Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation Abstract Qwen-RobotWorld is a language-conditioned video world model that predicts future visual trajectories across multiple robotic domains using a double-stream diffusion transformer and embodied world knowledge corpus. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We… 5 Hugging Face Daily Papers research 16d ago Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes Abstract Hierarchical Advantage-Weighted Behavior Cloning (HABC) addresses sparse reward challenges in robot learning by separately optimizing viability and efficiency objectives through adaptive critic heads and intervention-aware credit assignment, significantly improving… 9 Hugging Face Daily Papers research 16d ago Geometric Action Model for Robot Policy Learning Abstract A geometric action model leverages pretrained geometric foundation models to enable language-conditioned manipulation policies with improved accuracy, robustness, and efficiency in 3D physical environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Generalist robot… 21 arXiv — Machine Learning research 16d ago Phase-Localized Curation Does Not Help: A Negative Result on Per-Phase Metric Selection for Demonstration Filtering arXiv:2606.15064v1 Announce Type: new Abstract: Manipulation demonstrations have temporal phase structure, and a natural hypothesis is that demonstration-curation metrics should be applied within phases rather than globally. The idea is to segment each trajectory into phases,… 10 arXiv — NLP / Computation & Language research 16d ago Beyond English: Uncovering the Multilingual Gap in Vision-Language-Action Models arXiv:2606.15714v1 Announce Type: new Abstract: Vision-Language-Action models have recently demonstrated promising capabilities in learning generalist robot policies from large-scale multimodal data. However, most existing VLA systems are trained and evaluated primarily with… 12 NVIDIA Developer Blog official-blog 16d ago Pretrained to Imagine, Fine-Tuned to Act: The Rise of World-Action Models Quick glossary for readers new to VLA/WAM terminology VLA Vision-Language-Action model: a robot policy that starts from a pretrained VLM backbone and adapts it... 22 arXiv — Machine Learning research 17d ago SpikF-GO: Spiking Fourier Graph Operators for Multivariate Time Series Forecasting arXiv:2606.13901v1 Announce Type: new Abstract: Spiking Neural Networks (SNNs) have emerged as an energy-efficient alternative to conventional neural networks, demonstrating strong performance in computer vision and robotics. More recently, SNNs have been applied to time series… 30 arXiv — Machine Learning research 17d ago More with LESS -- Local Scene Representations for Tactile Imaging arXiv:2606.14344v1 Announce Type: new Abstract: Tactile imaging seeks to reconstruct the internal structure of soft objects through touch sensing, with applications in medical diagnosis and robotic manipulation. Recent self-supervised learning approaches have shown promising… 33 arXiv — NLP / Computation & Language research 17d ago Persuasion Index: A Theory-Guided Framework for Persuasion Analysis arXiv:2606.14580v1 Announce Type: new Abstract: Identifying persuasive rhetorical cues is critical across domains, from detecting information manipulation and improving AI safety to advancing public health communication. We propose Persuasion Index (PI), a taxonomy of 15… 36 Ars Technica — AI news-outlet 19d ago Here's what Jeff Bezos' new startup Prometheus will do It isn't the only startup tackling physical AI, but it's one of the best-funded. 5 Ars Technica — AI news-outlet 19d ago Ukraine's one-time test used fully autonomous drones to kill Russian soldiers Full autonomy is rare, but Ukraine is installing AI modules on drones and robots. 32 Hugging Face Daily Papers research 19d ago WEAVER, Better, Faster, Longer: An Effective World Model for Robotic Manipulation Abstract WEAVER is a multi-view world model architecture that achieves high fidelity, consistency, and efficiency in robotic manipulation tasks through flow-matching loss and demonstrates superior performance in policy evaluation, improvement, and test-time planning. Generated… 27 Hugging Face Daily Papers research 19d ago Revisiting Articulated Parts Perception in Robot Manipulation Abstract A new geometric representation called Geometric Primary Structure (GPS) is introduced for articulated parts perception, enabling efficient data collection through VR annotation and achieving high manipulation success rates without fine-tuning. Generated by… 27 Hugging Face Daily Papers research 20d ago LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories Abstract LabVLA, a vision-language-action model trained with a two-stage approach combining action token pretraining and flow matching, demonstrates superior performance on laboratory automation tasks through simulated data generation and robot-specific learning. Generated by… 18 arXiv — NLP / Computation & Language research 20d ago Does AI Reviewer See the Full Picture? Attacking and Defending Multimodal Peer Review arXiv:2606.12716v1 Announce Type: new Abstract: The integration of Large Language Models (LLMs) and Multimodal LLMs (MLLMs) into scientific peer-review workflows introduces novel and significant risks for adversarial manipulation, especially given the multimodal nature of… 8 arXiv — NLP / Computation & Language research 20d ago ComAct: Reframing Professional Software Manipulation via COM-as-Action Paradigm arXiv:2606.13239v1 Announce Type: cross Abstract: Existing computer-use agents remain fundamentally limited in professional software manipulation: GUI-based agents suffer from fragile visual grounding and long-horizon error accumulation, while API-basedapproaches struggle with… 34 Hugging Face Daily Papers research 20d ago MuJoCo-Drones-Gym: A GPU-Accelerated Multi-Drone Simulator for Control and Reinforcement Learning Abstract A Gymnasium-compatible multi-drone simulation environment built on MuJoCo physics engine that supports flexible physics models, action interfaces, and observation spaces for reinforcement learning applications. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Robotic… 35 TechCrunch — AI news-outlet 20d ago Theker just raised $85M to build the factory robot that doesn’t specialize in anything Unlike humanoid robots designed around a fixed form — think Boston Dynamics — Theker's machines are built to be reconfigured. 18 TechCrunch — AI news-outlet 20d ago Jeff Bezos’s Prometheus raises $12B to build an ‘artificial general engineer’ for the physical world The new round values the physical AI startup that aims to automate heavy engineering and drug design at $41 billion. 31 r/LocalLLaMA community 20d ago Refiner: Robotics library from the ex-Hugging Face pre-training team ex-Huggingface pre-training team just announce a new library create for robotics data refinment! It supports ingestion of all robotics formats (Parquet, HDF5, MCAP, Zarr, RLDS, and LeRobot), as well as the common processing flows like visual hand-tracking, subtask annotations… 26 arXiv — Machine Learning research 21d ago Implicit Neural Representations of Individual Behavior arXiv:2606.12200v1 Announce Type: new Abstract: We study policy representation learning from unlabeled multi-policy behavioral data. Each episode is generated by a fixed policy, but policy labels are unavailable. This setting appears in robotics play, demonstrations, games,… 28 arXiv — Machine Learning research 21d ago Fourier Features Let Agents Learn High Precision Policies with Imitation Learning arXiv:2606.12334v1 Announce Type: new Abstract: High-precision robotic manipulation requires fine-grained spatial reasoning that is often difficult to achieve with RGB-only policies due to depth ambiguity and perspective scale issues. Policies that leverage 3D information… 14 arXiv — NLP / Computation & Language research 21d ago Detecting AI-Generated Content on Social Media with Multi-modal Language Models arXiv:2606.11200v1 Announce Type: new Abstract: Generative AI has enabled the creation of photorealistic images and videos that are increasingly disseminated on social media, often used for spam, misinformation, manipulation, and fraud. Existing AI-generated content (AIGC)… 36 arXiv — NLP / Computation & Language research 21d ago When Does Language Matter? Multilingual Instructions Reveal Step-wise Language Sensitivity in Vision-Language-Action Models arXiv:2606.11906v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models have shown strong performance in language-conditioned robotic manipulation, yet their robustness to linguistic variation remains poorly understood. In this work, we present the first systematic… 17 Hugging Face Daily Papers research 21d ago World Pilot: Steering Vision-Language-Action Models with World-Action Priors Abstract World Pilot enhances Vision-Language-Action models by incorporating dynamic scene evolution and trajectory priors from a World-Action Model, achieving superior performance in zero-shot out-of-distribution manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 10 Hugging Face Daily Papers research 22d ago BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling Abstract BrainSurgery is a tool for robust and reproducible tensor manipulation of neural network checkpoints through declarative YAML plans with built-in validation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As deep learning models scale, managing, inspecting, and modifying… 12 arXiv — Machine Learning research 22d ago Co-GLANCE: Uncertainty-Aware Active Perception for Heterogeneous Robot Teaming arXiv:2606.09919v1 Announce Type: new Abstract: Perceptual uncertainty is a central challenge for heterogeneous robot teams operating in unstructured outdoor environments, where no single viewpoint affords reliable scene understanding. Perceptual uncertainty, arising from… 36 arXiv — NLP / Computation & Language research 22d ago TabClaw: An Interactive and Self-Evolving Agent for Spreadsheet Manipulation and Table Reasoning arXiv:2606.10316v1 Announce Type: new Abstract: Spreadsheets and tables are widely used representations for structured data analysis, but effective analysis still requires substantial manual effort and domain expertise. Recent large language model (LLM) agents can automate parts… 31 arXiv — NLP / Computation & Language research 22d ago Beyond APIs: Probing the Limits of MLLMs in Physical Tool Use arXiv:2606.10803v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) excel at utilizing digital APIs and increasingly serve as the "brain" of embodied AI, instructing robots to interact with the physical world. In such embodied settings, a central capability… 38 Hugging Face Daily Papers research 22d ago ABot-Earth 0.5: Generative 3D Earth Model Abstract ABot-Earth 0.5 generates realistic 3D environments from satellite imagery using 3D Gaussian Splatting representation, enabling fast synthesis and real-time visualization for Embodied AI applications. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present ABot-Earth… 22 Hugging Face Daily Papers research 22d ago VoLo: A Physical Orchestrator for Open-Vocabulary Long-Horizon Manipulation Abstract VoLoAgent enables physical orchestration by integrating vision-language models with robot capabilities for open-vocabulary long-horizon manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Open-vocabulary long-horizon manipulation requires robots to reason… 24 The Information — AI news-outlet 22d ago Kalshi Asks Some Customers For Employer Information Prediction markets platform Kalshi is asking customers in some wagers to provide the name of their employer, industry and job function before making bets, to help the company crack down on potential insider trading. “For markets with heightened insider or manipulation risk, we… 23 TechCrunch — AI news-outlet 22d ago Hey Siri, here’s what I actually want from AI I'm desperate for a personal AI assistant, but do I really want to become the kind of person who can't function without the friendly robot voice in my phone? 4 Hugging Face Daily Papers research 22d ago Robotic Policy Adaptation via Weight-Space Meta-Learning Abstract WIZARD is a weight-space meta-learning framework that generates task-specific LoRA parameters for frozen VLA policies using language instructions and demonstration videos, enabling efficient task adaptation without fine-tuning. Generated by… 31 Hugging Face Daily Papers research 22d ago Light-WAM: Efficient World Action Models with State-Fusion Action Decoding Abstract Light-WAM is a lightweight world action model for robot manipulation that uses a compact video backbone and downsampled latent space for efficient future-video supervision, combined with a StateFusionActionExpert for direct action prediction. Generated by… 25 Google DeepMind official-blog 22d ago Powering the future of robotics in Europe Powering the future of robotics in Europe Jun 09, 2026 · Share x.com Facebook LinkedIn Mail Google DeepMind Accelerator selects 15 robotics companies from across Europe to join the program. Providing 3 months of intensive mentorship and technical support, enabling the… 22 Page 2 of 4 · 194 articles ← Newer Older →