We study action-conditioned world modeling as a scalable way to learn transferable dynamics priors for robot learning. By pretraining a model to predict how actions drive visual scene evolution, the resulting world model captures reusable interaction dynamics beyond appearance-level video generation. Concretely, we pretrain a multi-view interactive base diffusion world model, A2World, on large-scale robot manipulation data with real action annotations. We validate the learned dynamics priors from two complementary perspectives. First, we adapt A2World into a task- or scene-specialized real-world simulator, A2World-sim, whose long-horizon rollouts support simulator-based policy evaluation and scalable what-if analysis by replacing real-robot rollouts with world model rollouts. Second, starting from the same pretrained weights, we adapt A2World into a video-action joint prediction model, A2World-policy, that predicts actions under visual and instruction conditioning. Experiments across simulation benchmarks and real-robot settings demonstrate that action-conditioned world model pretraining yields transferable dynamics priors that benefit both simulator-centric and policy-centric robot learning.</p>\n","updatedAt":"2026-06-30T07:29:35.322Z","author":{"_id":"640d8a26b03f4cd29f52acdd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1678608917790-noauth.png","fullname":"Jiahui Zhang","name":"jasonzhango","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8424460887908936},"editors":["jasonzhango"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1678608917790-noauth.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.29501","authors":[{"_id":"6a437026763f63ca3757eb36","name":"Ze Huang","hidden":false},{"_id":"6a437026763f63ca3757eb37","name":"Jiahui Zhang","hidden":false},{"_id":"6a437026763f63ca3757eb38","name":"Hairuo Liu","hidden":false},{"_id":"6a437026763f63ca3757eb39","name":"Chenxi Zhang","hidden":false},{"_id":"6a437026763f63ca3757eb3a","name":"Ran Cheng","hidden":false},{"_id":"6a437026763f63ca3757eb3b","name":"Li Zhang","hidden":false}],"publishedAt":"2026-06-28T00:00:00.000Z","submittedOnDailyAt":"2026-06-30T00:00:00.000Z","title":"Learning Transferable Dynamics Priors from Action to World Modeling","submittedOnDailyBy":{"_id":"640d8a26b03f4cd29f52acdd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1678608917790-noauth.png","isPro":false,"fullname":"Jiahui Zhang","user":"jasonzhango","type":"user","name":"jasonzhango"},"summary":"We study action-conditioned world modeling as a scalable way to learn transferable dynamics priors for robot learning. By pretraining a model to predict how actions drive visual scene evolution, the resulting world model captures reusable interaction dynamics beyond appearance-level video generation. Concretely, we pretrain a multi-view interactive base diffusion world model, A2World, on large-scale robot manipulation data with real action annotations. We validate the learned dynamics priors from two complementary perspectives. First, we adapt A2World into a task- or scene-specialized real-world simulator, A2World-sim, whose long-horizon rollouts support simulator-based policy evaluation and scalable what-if analysis by replacing real-robot rollouts with world model rollouts. Second, starting from the same pretrained weights, we adapt A2World into a video-action joint prediction model, A2World-policy, that predicts actions under visual and instruction conditioning. Experiments across simulation benchmarks and real-robot settings demonstrate that action-conditioned world model pretraining yields transferable dynamics priors that benefit both simulator-centric and policy-centric robot learning.","upvotes":1,"discussionId":"6a437026763f63ca3757eb3c","ai_summary":"Action-conditioned world modeling enables transferable dynamics priors for robot learning through pretraining on large-scale manipulation data, supporting both simulator-based policy evaluation and video-action prediction.","ai_keywords":["world modeling","diffusion world model","action-conditioned","multi-view interactive","pretraining","robot manipulation","simulator-centric learning","policy-centric learning","video-action joint prediction","dynamics priors"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct"},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"640d8a26b03f4cd29f52acdd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1678608917790-noauth.png","isPro":false,"fullname":"Jiahui Zhang","user":"jasonzhango","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.29501.md","query":{}}">
Learning Transferable Dynamics Priors from Action to World Modeling
Abstract
Action-conditioned world modeling enables transferable dynamics priors for robot learning through pretraining on large-scale manipulation data, supporting both simulator-based policy evaluation and video-action prediction.
We study action-conditioned world modeling as a scalable way to learn transferable dynamics priors for robot learning. By pretraining a model to predict how actions drive visual scene evolution, the resulting world model captures reusable interaction dynamics beyond appearance-level video generation. Concretely, we pretrain a multi-view interactive base diffusion world model, A2World, on large-scale robot manipulation data with real action annotations. We validate the learned dynamics priors from two complementary perspectives. First, we adapt A2World into a task- or scene-specialized real-world simulator, A2World-sim, whose long-horizon rollouts support simulator-based policy evaluation and scalable what-if analysis by replacing real-robot rollouts with world model rollouts. Second, starting from the same pretrained weights, we adapt A2World into a video-action joint prediction model, A2World-policy, that predicts actions under visual and instruction conditioning. Experiments across simulation benchmarks and real-robot settings demonstrate that action-conditioned world model pretraining yields transferable dynamics priors that benefit both simulator-centric and policy-centric robot learning.
Community
We study action-conditioned world modeling as a scalable way to learn transferable dynamics priors for robot learning. By pretraining a model to predict how actions drive visual scene evolution, the resulting world model captures reusable interaction dynamics beyond appearance-level video generation. Concretely, we pretrain a multi-view interactive base diffusion world model, A2World, on large-scale robot manipulation data with real action annotations. We validate the learned dynamics priors from two complementary perspectives. First, we adapt A2World into a task- or scene-specialized real-world simulator, A2World-sim, whose long-horizon rollouts support simulator-based policy evaluation and scalable what-if analysis by replacing real-robot rollouts with world model rollouts. Second, starting from the same pretrained weights, we adapt A2World into a video-action joint prediction model, A2World-policy, that predicts actions under visual and instruction conditioning. Experiments across simulation benchmarks and real-robot settings demonstrate that action-conditioned world model pretraining yields transferable dynamics priors that benefit both simulator-centric and policy-centric robot learning.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.29501 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.29501 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.29501 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.