Hugging Face Daily Papers · June 30, 2026 · 4 min read

Learning Transferable Dynamics Priors from Action to World Modeling

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

We study action-conditioned world modeling as a scalable way to learn transferable dynamics priors for robot learning. By pretraining a model to predict how actions drive visual scene evolution, the resulting world model captures reusable interaction dynamics beyond appearance-level video generation. Concretely, we pretrain a multi-view interactive base diffusion world model, A2World, on large-scale robot manipulation data with real action annotations. We validate the learned dynamics priors from two complementary perspectives. First, we adapt A2World into a task- or scene-specialized real-world simulator, A2World-sim, whose long-horizon rollouts support simulator-based policy evaluation and scalable what-if analysis by replacing real-robot rollouts with world model rollouts. Second, starting from the same pretrained weights, we adapt A2World into a video-action joint prediction model, A2World-policy, that predicts actions under visual and instruction conditioning. Experiments across simulation benchmarks and real-robot settings demonstrate that action-conditioned world model pretraining yields transferable dynamics priors that benefit both simulator-centric and policy-centric robot learning.</p>\n","updatedAt":"2026-06-30T07:29:35.322Z","author":{"_id":"640d8a26b03f4cd29f52acdd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1678608917790-noauth.png","fullname":"Jiahui Zhang","name":"jasonzhango","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8424460887908936},"editors":["jasonzhango"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1678608917790-noauth.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.29501","authors":[{"_id":"6a437026763f63ca3757eb36","name":"Ze Huang","hidden":false},{"_id":"6a437026763f63ca3757eb37","name":"Jiahui Zhang","hidden":false},{"_id":"6a437026763f63ca3757eb38","name":"Hairuo Liu","hidden":false},{"_id":"6a437026763f63ca3757eb39","name":"Chenxi Zhang","hidden":false},{"_id":"6a437026763f63ca3757eb3a","name":"Ran Cheng","hidden":false},{"_id":"6a437026763f63ca3757eb3b","name":"Li Zhang","hidden":false}],"publishedAt":"2026-06-28T00:00:00.000Z","submittedOnDailyAt":"2026-06-30T00:00:00.000Z","title":"Learning Transferable Dynamics Priors from Action to World Modeling","submittedOnDailyBy":{"_id":"640d8a26b03f4cd29f52acdd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1678608917790-noauth.png","isPro":false,"fullname":"Jiahui Zhang","user":"jasonzhango","type":"user","name":"jasonzhango"},"summary":"We study action-conditioned world modeling as a scalable way to learn transferable dynamics priors for robot learning. By pretraining a model to predict how actions drive visual scene evolution, the resulting world model captures reusable interaction dynamics beyond appearance-level video generation. Concretely, we pretrain a multi-view interactive base diffusion world model, A2World, on large-scale robot manipulation data with real action annotations. We validate the learned dynamics priors from two complementary perspectives. First, we adapt A2World into a task- or scene-specialized real-world simulator, A2World-sim, whose long-horizon rollouts support simulator-based policy evaluation and scalable what-if analysis by replacing real-robot rollouts with world model rollouts. Second, starting from the same pretrained weights, we adapt A2World into a video-action joint prediction model, A2World-policy, that predicts actions under visual and instruction conditioning. Experiments across simulation benchmarks and real-robot settings demonstrate that action-conditioned world model pretraining yields transferable dynamics priors that benefit both simulator-centric and policy-centric robot learning.","upvotes":1,"discussionId":"6a437026763f63ca3757eb3c","ai_summary":"Action-conditioned world modeling enables transferable dynamics priors for robot learning through pretraining on large-scale manipulation data, supporting both simulator-based policy evaluation and video-action prediction.","ai_keywords":["world modeling","diffusion world model","action-conditioned","multi-view interactive","pretraining","robot manipulation","simulator-centric learning","policy-centric learning","video-action joint prediction","dynamics priors"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct"},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"640d8a26b03f4cd29f52acdd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1678608917790-noauth.png","isPro":false,"fullname":"Jiahui Zhang","user":"jasonzhango","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.29501.md","query":{}}">

Papers

arxiv:2606.29501

Learning Transferable Dynamics Priors from Action to World Modeling

Published on Jun 28

· Submitted by

Jiahui Zhang on Jun 30

Upvote

Authors:

Abstract

Action-conditioned world modeling enables transferable dynamics priors for robot learning through pretraining on large-scale manipulation data, supporting both simulator-based policy evaluation and video-action prediction.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

View arXiv page View PDF Add to collection

Community

jasonzhango

Paper submitter about 18 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.29501

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.29501 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.29501 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.29501 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Learning Transferable Dynamics Priors from Action to World Modeling

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers