Hugging Face Daily Papers · · 4 min read

Learning Transferable Dynamics Priors from Action to World Modeling

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

We study action-conditioned world modeling as a scalable way to learn transferable dynamics priors for robot learning. By pretraining a model to predict how actions drive visual scene evolution, the resulting world model captures reusable interaction dynamics beyond appearance-level video generation. Concretely, we pretrain a multi-view interactive base diffusion world model, A2World, on large-scale robot manipulation data with real action annotations. We validate the learned dynamics priors from two complementary perspectives. First, we adapt A2World into a task- or scene-specialized real-world simulator, A2World-sim, whose long-horizon rollouts support simulator-based policy evaluation and scalable what-if analysis by replacing real-robot rollouts with world model rollouts. Second, starting from the same pretrained weights, we adapt A2World into a video-action joint prediction model, A2World-policy, that predicts actions under visual and instruction conditioning. Experiments across simulation benchmarks and real-robot settings demonstrate that action-conditioned world model pretraining yields transferable dynamics priors that benefit both simulator-centric and policy-centric robot learning.</p>\n","updatedAt":"2026-06-30T07:29:35.322Z","author":{"_id":"640d8a26b03f4cd29f52acdd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1678608917790-noauth.png","fullname":"Jiahui Zhang","name":"jasonzhango","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8424460887908936},"editors":["jasonzhango"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1678608917790-noauth.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.29501","authors":[{"_id":"6a437026763f63ca3757eb36","name":"Ze Huang","hidden":false},{"_id":"6a437026763f63ca3757eb37","name":"Jiahui Zhang","hidden":false},{"_id":"6a437026763f63ca3757eb38","name":"Hairuo Liu","hidden":false},{"_id":"6a437026763f63ca3757eb39","name":"Chenxi Zhang","hidden":false},{"_id":"6a437026763f63ca3757eb3a","name":"Ran Cheng","hidden":false},{"_id":"6a437026763f63ca3757eb3b","name":"Li Zhang","hidden":false}],"publishedAt":"2026-06-28T00:00:00.000Z","submittedOnDailyAt":"2026-06-30T00:00:00.000Z","title":"Learning Transferable Dynamics Priors from Action to World Modeling","submittedOnDailyBy":{"_id":"640d8a26b03f4cd29f52acdd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1678608917790-noauth.png","isPro":false,"fullname":"Jiahui Zhang","user":"jasonzhango","type":"user","name":"jasonzhango"},"summary":"We study action-conditioned world modeling as a scalable way to learn transferable dynamics priors for robot learning. By pretraining a model to predict how actions drive visual scene evolution, the resulting world model captures reusable interaction dynamics beyond appearance-level video generation. Concretely, we pretrain a multi-view interactive base diffusion world model, A2World, on large-scale robot manipulation data with real action annotations. We validate the learned dynamics priors from two complementary perspectives. First, we adapt A2World into a task- or scene-specialized real-world simulator, A2World-sim, whose long-horizon rollouts support simulator-based policy evaluation and scalable what-if analysis by replacing real-robot rollouts with world model rollouts. Second, starting from the same pretrained weights, we adapt A2World into a video-action joint prediction model, A2World-policy, that predicts actions under visual and instruction conditioning. Experiments across simulation benchmarks and real-robot settings demonstrate that action-conditioned world model pretraining yields transferable dynamics priors that benefit both simulator-centric and policy-centric robot learning.","upvotes":1,"discussionId":"6a437026763f63ca3757eb3c","ai_summary":"Action-conditioned world modeling enables transferable dynamics priors for robot learning through pretraining on large-scale manipulation data, supporting both simulator-based policy evaluation and video-action prediction.","ai_keywords":["world modeling","diffusion world model","action-conditioned","multi-view interactive","pretraining","robot manipulation","simulator-centric learning","policy-centric learning","video-action joint prediction","dynamics priors"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct"},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"640d8a26b03f4cd29f52acdd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1678608917790-noauth.png","isPro":false,"fullname":"Jiahui Zhang","user":"jasonzhango","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.29501.md","query":{}}">
Papers
arxiv:2606.29501

Learning Transferable Dynamics Priors from Action to World Modeling

Published on Jun 28
· Submitted by
Jiahui Zhang
on Jun 30
Authors:
,
,
,
,
,

Abstract

Action-conditioned world modeling enables transferable dynamics priors for robot learning through pretraining on large-scale manipulation data, supporting both simulator-based policy evaluation and video-action prediction.

We study action-conditioned world modeling as a scalable way to learn transferable dynamics priors for robot learning. By pretraining a model to predict how actions drive visual scene evolution, the resulting world model captures reusable interaction dynamics beyond appearance-level video generation. Concretely, we pretrain a multi-view interactive base diffusion world model, A2World, on large-scale robot manipulation data with real action annotations. We validate the learned dynamics priors from two complementary perspectives. First, we adapt A2World into a task- or scene-specialized real-world simulator, A2World-sim, whose long-horizon rollouts support simulator-based policy evaluation and scalable what-if analysis by replacing real-robot rollouts with world model rollouts. Second, starting from the same pretrained weights, we adapt A2World into a video-action joint prediction model, A2World-policy, that predicts actions under visual and instruction conditioning. Experiments across simulation benchmarks and real-robot settings demonstrate that action-conditioned world model pretraining yields transferable dynamics priors that benefit both simulator-centric and policy-centric robot learning.

Community

We study action-conditioned world modeling as a scalable way to learn transferable dynamics priors for robot learning. By pretraining a model to predict how actions drive visual scene evolution, the resulting world model captures reusable interaction dynamics beyond appearance-level video generation. Concretely, we pretrain a multi-view interactive base diffusion world model, A2World, on large-scale robot manipulation data with real action annotations. We validate the learned dynamics priors from two complementary perspectives. First, we adapt A2World into a task- or scene-specialized real-world simulator, A2World-sim, whose long-horizon rollouts support simulator-based policy evaluation and scalable what-if analysis by replacing real-robot rollouts with world model rollouts. Second, starting from the same pretrained weights, we adapt A2World into a video-action joint prediction model, A2World-policy, that predicts actions under visual and instruction conditioning. Experiments across simulation benchmarks and real-robot settings demonstrate that action-conditioned world model pretraining yields transferable dynamics priors that benefit both simulator-centric and policy-centric robot learning.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.29501
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.29501 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.29501 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.29501 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers