Hugging Face Daily Papers · · 7 min read

Orca: The World is in Your Mind

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Orca: An initial instantiation of a general world foundation model</p>\n<p>English version of video demo</p>\n<p><video src=\"https://cdn-uploads.huggingface.co/production/uploads/68c429ef60f075cc46cc9cff/x_k0Y8vWVrGGuk-uHfWv3.mp4\" controls=\"\" class=\"max-w-full!\"></video></p>\n","updatedAt":"2026-07-01T07:53:01.071Z","author":{"_id":"68c429ef60f075cc46cc9cff","avatarUrl":"/avatars/21d277a1b46eabc967121a111e270cdd.svg","fullname":"yh-wang","name":"yh-wang","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.6210347414016724},"editors":["yh-wang"],"editorAvatarUrls":["/avatars/21d277a1b46eabc967121a111e270cdd.svg"],"reactions":[{"reaction":"❤️","users":["multimodalart","yh-wang","yuheng2000","syq1105","HuaihaiLyu","jsw19","AdinaY"],"count":7}],"isReport":false}},{"id":"6a434ecf53b10acf98b3b196","author":{"_id":"68c429ef60f075cc46cc9cff","avatarUrl":"/avatars/21d277a1b46eabc967121a111e270cdd.svg","fullname":"yh-wang","name":"yh-wang","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2026-06-30T05:06:23.000Z","type":"comment","data":{"edited":true,"hidden":true,"hiddenBy":"","hiddenReason":"Low Quality","latest":{"raw":"This comment has been hidden","html":"This comment has been hidden","updatedAt":"2026-06-30T09:51:23.752Z","author":{"_id":"68c429ef60f075cc46cc9cff","avatarUrl":"/avatars/21d277a1b46eabc967121a111e270cdd.svg","fullname":"yh-wang","name":"yh-wang","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":2,"editors":[],"editorAvatarUrls":[],"reactions":[]}},{"id":"6a4444c05d5f6d116b4c15a1","author":{"_id":"65e78af9706749f643b4088b","avatarUrl":"/avatars/de11e45718030fb12bd84612df7f880c.svg","fullname":"Lionel Li","name":"abcdvzz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false},"createdAt":"2026-06-30T22:35:44.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"why can't I see this paper in daily-paper page anymore?\n","html":"<p>why can't I see this paper in daily-paper page anymore?</p>\n","updatedAt":"2026-06-30T22:35:44.979Z","author":{"_id":"65e78af9706749f643b4088b","avatarUrl":"/avatars/de11e45718030fb12bd84612df7f880c.svg","fullname":"Lionel Li","name":"abcdvzz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.978257417678833},"editors":["abcdvzz"],"editorAvatarUrls":["/avatars/de11e45718030fb12bd84612df7f880c.svg"],"reactions":[],"isReport":false},"replies":[{"id":"6a447a52e48edc01e14e42cf","author":{"_id":"68c429ef60f075cc46cc9cff","avatarUrl":"/avatars/21d277a1b46eabc967121a111e270cdd.svg","fullname":"yh-wang","name":"yh-wang","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2026-07-01T02:24:18.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Done! Thank you!","html":"<p>Done! Thank you!</p>\n","updatedAt":"2026-07-01T02:24:18.869Z","author":{"_id":"68c429ef60f075cc46cc9cff","avatarUrl":"/avatars/21d277a1b46eabc967121a111e270cdd.svg","fullname":"yh-wang","name":"yh-wang","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5443900227546692},"editors":["yh-wang"],"editorAvatarUrls":["/avatars/21d277a1b46eabc967121a111e270cdd.svg"],"reactions":[],"isReport":false,"parentCommentId":"6a4444c05d5f6d116b4c15a1"}}]},{"id":"6a4479051569e458a001437d","author":{"_id":"68c429ef60f075cc46cc9cff","avatarUrl":"/avatars/21d277a1b46eabc967121a111e270cdd.svg","fullname":"yh-wang","name":"yh-wang","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2026-07-01T02:18:45.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"Orca: an initial instantiation of a general world foundation model.\n","html":"<p>Orca: an initial instantiation of a general world foundation model.</p>\n","updatedAt":"2026-07-01T07:52:54.062Z","author":{"_id":"68c429ef60f075cc46cc9cff","avatarUrl":"/avatars/21d277a1b46eabc967121a111e270cdd.svg","fullname":"yh-wang","name":"yh-wang","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":2,"identifiedLanguage":{"language":"en","probability":0.6349899768829346},"editors":["yh-wang"],"editorAvatarUrls":["/avatars/21d277a1b46eabc967121a111e270cdd.svg"],"reactions":[],"isReport":false}},{"id":"6a45c35e9ccf9a0e8c32510a","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":372,"isUserFollowing":false},"createdAt":"2026-07-02T01:48:14.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Motion-Focused Latent Action Enables Cross-Embodiment VLA Training from Human EgoVideos](https://huggingface.co/papers/2606.18955) (2026)\n* [RepWAM: World Action Modeling with Representation Visual-Action Tokenizers](https://huggingface.co/papers/2606.13674) (2026)\n* [RotVLA: Rotational Latent Action for Vision-Language-Action Model](https://huggingface.co/papers/2605.13403) (2026)\n* [GeoSem-WAM: Geometry- and Semantic-Aware World Action Models](https://huggingface.co/papers/2606.03188) (2026)\n* [Geometric Action Model for Robot Policy Learning](https://huggingface.co/papers/2606.17046) (2026)\n* [HiMem-WAM: Hierarchical Memory-Gated World Action Models for Robotic Manipulation](https://huggingface.co/papers/2606.10363) (2026)\n* [STORM: Internalized Modeling for Spatial-Temporal Reasoning in Video-Language Models](https://huggingface.co/papers/2605.26014) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2606.18955\">Motion-Focused Latent Action Enables Cross-Embodiment VLA Training from Human EgoVideos</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.13674\">RepWAM: World Action Modeling with Representation Visual-Action Tokenizers</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.13403\">RotVLA: Rotational Latent Action for Vision-Language-Action Model</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.03188\">GeoSem-WAM: Geometry- and Semantic-Aware World Action Models</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.17046\">Geometric Action Model for Robot Policy Learning</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.10363\">HiMem-WAM: Hierarchical Memory-Gated World Action Models for Robotic Manipulation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.26014\">STORM: Internalized Modeling for Spatial-Temporal Reasoning in Video-Language Models</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code>@librarian-bot recommend</code></p>\n","updatedAt":"2026-07-02T01:48:14.620Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":372,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6863317489624023},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.30534","authors":[{"_id":"6a4334fd763f63ca3757e917","user":{"_id":"68c429ef60f075cc46cc9cff","avatarUrl":"/avatars/21d277a1b46eabc967121a111e270cdd.svg","isPro":false,"fullname":"yh-wang","user":"yh-wang","type":"user","name":"yh-wang"},"name":"Yihao Wang","status":"claimed_verified","statusLastChangedAt":"2026-07-01T09:49:27.650Z","hidden":false},{"_id":"6a4334fd763f63ca3757e918","user":{"_id":"668f5478b3991ac0c3fc9c2f","avatarUrl":"/avatars/a775853d3b88e7b1c8494ca837b5495c.svg","isPro":false,"fullname":"yuhengji","user":"yuheng2000","type":"user","name":"yuheng2000"},"name":"Yuheng Ji","status":"claimed_verified","statusLastChangedAt":"2026-07-01T09:49:25.392Z","hidden":false},{"_id":"6a4334fd763f63ca3757e919","user":{"_id":"668fa476cbcaf7ab0e4c58b3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/668fa476cbcaf7ab0e4c58b3/F5Jj-nPCjU6uxZyfkY3qw.jpeg","isPro":false,"fullname":"Mingyu Cao","user":"cmyopu","type":"user","name":"cmyopu"},"name":"Mingyu Cao","status":"claimed_verified","statusLastChangedAt":"2026-07-01T08:47:12.327Z","hidden":false},{"_id":"6a4334fd763f63ca3757e91a","user":{"_id":"671fa0de0a34a05602094909","avatarUrl":"/avatars/e35890e1f71a8c3f4e33b35a027e180a.svg","isPro":false,"fullname":"Yanqing Shen","user":"syq1105","type":"user","name":"syq1105"},"name":"Yanqing Shen","status":"claimed_verified","statusLastChangedAt":"2026-07-01T08:47:16.531Z","hidden":false},{"_id":"6a4334fd763f63ca3757e91b","name":"Runze Xiao","hidden":false},{"_id":"6a4334fd763f63ca3757e91c","user":{"_id":"67ecd5dc6b33c72fba1fc70f","avatarUrl":"/avatars/27ff8fb1534f1d277ae03e8adc64ffba.svg","isPro":false,"fullname":"huaihai lyu","user":"HuaihaiLyu","type":"user","name":"HuaihaiLyu"},"name":"Huaihai Lyu","status":"claimed_verified","statusLastChangedAt":"2026-07-01T08:47:10.262Z","hidden":false},{"_id":"6a4334fd763f63ca3757e91d","user":{"_id":"643613fdaaef013d1af18717","avatarUrl":"/avatars/b4689886eaf2103d8e92f90ea5e5bc5e.svg","isPro":false,"fullname":"Senwei Xie","user":"jsw19","type":"user","name":"jsw19"},"name":"Senwei Xie","status":"claimed_verified","statusLastChangedAt":"2026-07-01T09:49:51.559Z","hidden":false},{"_id":"6a4334fd763f63ca3757e91e","name":"Euan Liu","hidden":false},{"_id":"6a4334fd763f63ca3757e91f","name":"Klara Tian","hidden":false},{"_id":"6a4334fd763f63ca3757e920","name":"Tianfeng Long","hidden":false},{"_id":"6a4334fd763f63ca3757e921","name":"Yichi Zhang","hidden":false},{"_id":"6a4334fd763f63ca3757e922","name":"Zhengliang Cai","hidden":false},{"_id":"6a4334fd763f63ca3757e923","name":"Ruike Chen","hidden":false},{"_id":"6a4334fd763f63ca3757e924","name":"Jifan Zhao","hidden":false},{"_id":"6a4334fd763f63ca3757e925","name":"Ruochuan Shi","hidden":false},{"_id":"6a4334fd763f63ca3757e926","user":{"_id":"65fce52cb6fa00949682d41b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65fce52cb6fa00949682d41b/RZ6J1Ovix-jJW9KSiWcEk.jpeg","isPro":false,"fullname":"Zihan Tang","user":"tzh21","type":"user","name":"tzh21"},"name":"Zihan Tang","status":"claimed_verified","statusLastChangedAt":"2026-07-01T08:47:14.193Z","hidden":false},{"_id":"6a4334fd763f63ca3757e927","name":"Jing Lyu","hidden":false},{"_id":"6a4334fd763f63ca3757e928","name":"Wenxing Tan","hidden":false},{"_id":"6a4334fd763f63ca3757e929","name":"Ningbo Zhang","hidden":false},{"_id":"6a4334fd763f63ca3757e92a","name":"Yangtao Hu","hidden":false},{"_id":"6a4334fd763f63ca3757e92b","name":"Yuming Gao","hidden":false},{"_id":"6a4334fd763f63ca3757e92c","name":"Xiansheng Chen","hidden":false},{"_id":"6a4334fd763f63ca3757e92d","name":"Junkai Zhao","hidden":false},{"_id":"6a4334fd763f63ca3757e92e","name":"Congsheng Xu","hidden":false},{"_id":"6a4334fd763f63ca3757e92f","name":"Boan Zhu","hidden":false},{"_id":"6a4334fd763f63ca3757e930","name":"Ziqi Wang","hidden":false},{"_id":"6a4334fd763f63ca3757e931","name":"Yupu Feng","hidden":false},{"_id":"6a4334fd763f63ca3757e932","name":"Qiongqiong Zhang","hidden":false},{"_id":"6a4334fd763f63ca3757e933","name":"Yingli Zhao","hidden":false},{"_id":"6a4334fd763f63ca3757e934","name":"Yulong Ao","hidden":false},{"_id":"6a4334fd763f63ca3757e935","name":"Shaoxuan Xie","hidden":false},{"_id":"6a4334fd763f63ca3757e936","name":"You Liu","hidden":false},{"_id":"6a4334fd763f63ca3757e937","name":"Guocai Yao","hidden":false},{"_id":"6a4334fd763f63ca3757e938","name":"Leiduo Zhang","hidden":false},{"_id":"6a4334fd763f63ca3757e939","name":"Xiaodan Liu","hidden":false},{"_id":"6a4334fd763f63ca3757e93a","name":"Yunyan Zhang","hidden":false},{"_id":"6a4334fd763f63ca3757e93b","name":"Yance Jiao","hidden":false},{"_id":"6a4334fd763f63ca3757e93c","name":"Xinyan Yang","hidden":false},{"_id":"6a4334fd763f63ca3757e93d","name":"Jiaxing Wei","hidden":false},{"_id":"6a4334fd763f63ca3757e93e","name":"Xu Liu","hidden":false},{"_id":"6a4334fd763f63ca3757e93f","name":"Tengfei Pan","hidden":false},{"_id":"6a4334fd763f63ca3757e940","name":"Shaokai Nie","hidden":false},{"_id":"6a4334fd763f63ca3757e941","name":"Chunlei Men","hidden":false},{"_id":"6a4334fd763f63ca3757e942","name":"Sen Cui","hidden":false},{"_id":"6a4334fd763f63ca3757e943","name":"Xiaojie Jin","hidden":false},{"_id":"6a4334fd763f63ca3757e944","name":"Hongyang Li","hidden":false},{"_id":"6a4334fd763f63ca3757e945","name":"Jianlan Luo","hidden":false},{"_id":"6a4334fd763f63ca3757e946","name":"Yao Mu","hidden":false},{"_id":"6a4334fd763f63ca3757e947","name":"Yunchao Wei","hidden":false},{"_id":"6a4334fd763f63ca3757e948","name":"Jun Yan","hidden":false},{"_id":"6a4334fd763f63ca3757e949","name":"Hang Zhao","hidden":false},{"_id":"6a4334fd763f63ca3757e94a","name":"Xiaolong Zheng","hidden":false},{"_id":"6a4334fd763f63ca3757e94b","name":"Jiaming Li","hidden":false},{"_id":"6a4334fd763f63ca3757e94c","name":"Yonghua Lin","hidden":false},{"_id":"6a4334fd763f63ca3757e94d","name":"Tiejun Huang","hidden":false},{"_id":"6a4334fd763f63ca3757e94e","name":"Zhongyuan Wang","hidden":false},{"_id":"6a4334fd763f63ca3757e94f","name":"Pengwei Wang","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/68c429ef60f075cc46cc9cff/ceZCPH1jShoRORZCEPALp.mp4"],"publishedAt":"2026-06-29T00:00:00.000Z","submittedOnDailyAt":"2026-07-01T00:00:00.000Z","title":"Orca: The World is in Your Mind","submittedOnDailyBy":{"_id":"68c429ef60f075cc46cc9cff","avatarUrl":"/avatars/21d277a1b46eabc967121a111e270cdd.svg","isPro":false,"fullname":"yh-wang","user":"yh-wang","type":"user","name":"yh-wang"},"summary":"We introduce Orca, an initial instantiation of a general world foundation model. Orca learns a unified world latent space from multimodal world signals and exposes it through multimodal readout interfaces. Rather than optimizing isolated next-token, next-frame, or next-action prediction, we are centered on Next-State-Prediction modeling, offering a unified state-transition modeling route toward understanding, predicting, and acting upon the world. Orca learns through two complementary paradigms: unconscious learning captures dense natural state transitions from continuous videos, and conscious learning models sparse meaningful state transitions by language-described events and VQA supervision. For pre-training, we construct a large-scale world-learning inventory data, including 125K hours of video data and 160M event annotations. After pre-training, Orca learns a unified world latent space. To examine whether the learned latent supports downstream, we evaluate it by three representative downstream readouts: text generation, image prediction, and embodied action generation. Orca's backbone is frozen, and only the lightweight modality-specific decoders are trainable. Experiments show the scalability of the proposed paradigm and verify that stronger world latent enables stronger downstream readouts. Orca outperforms similar-sized specialized baselines. These results show that Orca, as a general world foundation model, presents a promising approach to understanding, predicting, and acting upon the world. Finally, we discuss the current limitations, aiming to provide useful insights and inspiration for the community.","upvotes":177,"discussionId":"6a4334fd763f63ca3757e950","projectPage":"https://orca-wm.github.io/","ai_summary":"Orca establishes a unified world latent space through next-state-prediction modeling using multimodal data and demonstrates superior performance in downstream tasks compared to specialized baselines.","ai_keywords":["world foundation model","world latent space","multimodal readout interfaces","next-state-prediction modeling","unconscious learning","conscious learning","world-learning inventory data","embodied action generation","downstream readouts","modality-specific decoders"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct"},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"668f5478b3991ac0c3fc9c2f","avatarUrl":"/avatars/a775853d3b88e7b1c8494ca837b5495c.svg","isPro":false,"fullname":"yuhengji","user":"yuheng2000","type":"user"},{"_id":"671fa0de0a34a05602094909","avatarUrl":"/avatars/e35890e1f71a8c3f4e33b35a027e180a.svg","isPro":false,"fullname":"Yanqing Shen","user":"syq1105","type":"user"},{"_id":"68c42a67875f1b666674b155","avatarUrl":"/avatars/633a31af89cb5800d17c917f5dc69c90.svg","isPro":false,"fullname":"lingxiaoli","user":"lingxiaoli","type":"user"},{"_id":"64c3e387ed521f27a4e62fb2","avatarUrl":"/avatars/0dac1fdad16c4d2538b7fb2b1ff4e120.svg","isPro":false,"fullname":"WangXIXI","user":"wangxixi","type":"user"},{"_id":"67e406e64b80e9b39e2a85d6","avatarUrl":"/avatars/a021be64341e5d7a079858916fa34c28.svg","isPro":false,"fullname":"MinglanLin","user":"MinglanLin","type":"user"},{"_id":"65fce52cb6fa00949682d41b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65fce52cb6fa00949682d41b/RZ6J1Ovix-jJW9KSiWcEk.jpeg","isPro":false,"fullname":"Zihan Tang","user":"tzh21","type":"user"},{"_id":"6570450a78d7aca0c361a177","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6570450a78d7aca0c361a177/MX7jHhTQwLs-BvYIu5rqb.jpeg","isPro":false,"fullname":"Harold Chen","user":"Harold328","type":"user"},{"_id":"66b394a5987f5d2e497ec586","avatarUrl":"/avatars/3d5f83477f31c5bf4a3874b274671ff4.svg","isPro":false,"fullname":"Korey Kshlerin","user":"good-only","type":"user"},{"_id":"688f826ec643cb8458976797","avatarUrl":"/avatars/3751ed39c1462aba338c69a2e3c99df0.svg","isPro":false,"fullname":"Zum","user":"zumuzo","type":"user"},{"_id":"63a369d98c0c89dcae3b8329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg","isPro":false,"fullname":"Adina Yakefu","user":"AdinaY","type":"user"},{"_id":"65ca42cdec034119ee2947dc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65ca42cdec034119ee2947dc/R8lQVZfXKyQems8mVfp_A.png","isPro":false,"fullname":"Dave Bayer","user":"davebay","type":"user"},{"_id":"689357458de72bdb797e21fa","avatarUrl":"/avatars/8463e6fb62f304b82dba223caebbe89f.svg","isPro":false,"fullname":"Seeta Kapoor","user":"SeetaFace","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":1,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.30534.md","query":{}}">
Papers
arxiv:2606.30534

Orca: The World is in Your Mind

Published on Jun 29
· Submitted by
yh-wang
on Jul 1
#1 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Orca establishes a unified world latent space through next-state-prediction modeling using multimodal data and demonstrates superior performance in downstream tasks compared to specialized baselines.

We introduce Orca, an initial instantiation of a general world foundation model. Orca learns a unified world latent space from multimodal world signals and exposes it through multimodal readout interfaces. Rather than optimizing isolated next-token, next-frame, or next-action prediction, we are centered on Next-State-Prediction modeling, offering a unified state-transition modeling route toward understanding, predicting, and acting upon the world. Orca learns through two complementary paradigms: unconscious learning captures dense natural state transitions from continuous videos, and conscious learning models sparse meaningful state transitions by language-described events and VQA supervision. For pre-training, we construct a large-scale world-learning inventory data, including 125K hours of video data and 160M event annotations. After pre-training, Orca learns a unified world latent space. To examine whether the learned latent supports downstream, we evaluate it by three representative downstream readouts: text generation, image prediction, and embodied action generation. Orca's backbone is frozen, and only the lightweight modality-specific decoders are trainable. Experiments show the scalability of the proposed paradigm and verify that stronger world latent enables stronger downstream readouts. Orca outperforms similar-sized specialized baselines. These results show that Orca, as a general world foundation model, presents a promising approach to understanding, predicting, and acting upon the world. Finally, we discuss the current limitations, aiming to provide useful insights and inspiration for the community.

Community

Paper author Paper submitter 2 days ago
edited about 18 hours ago

Orca: An initial instantiation of a general world foundation model

English version of video demo

Paper author Paper submitter 2 days ago
This comment has been hidden (marked as Low Quality)

why can't I see this paper in daily-paper page anymore?

·

Done! Thank you!

Paper author Paper submitter about 24 hours ago
edited about 18 hours ago

Orca: an initial instantiation of a general world foundation model.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.30534
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.30534 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.30534 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.30534 in a Space README.md to link it from this page.

Collections including this paper 7

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers