Hugging Face Daily Papers · June 30, 2026 · 3 min read

Walking in the Implicit: Interactive World Exploration via Neural Scene Representation

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

<a href=\"https://cdn-uploads.huggingface.co/production/uploads/6353b63b06d707b3324279e3/NGIYwe4zasmDaBBA5_DOX.png\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/6353b63b06d707b3324279e3/NGIYwe4zasmDaBBA5_DOX.png\" alt=\"teaser\"></a></p>\n<p>Our system rolls out a fixed-length, renderable Neural Implicit Scene state and renders queried observations under camera control.</p>\n","updatedAt":"2026-06-30T03:38:26.294Z","author":{"_id":"6353b63b06d707b3324279e3","avatarUrl":"/avatars/d5f2de814a5f7570ad1710b28c22cf88.svg","fullname":"Zhiqi Li","name":"lzq49","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5970581769943237},"editors":["lzq49"],"editorAvatarUrls":["/avatars/d5f2de814a5f7570ad1710b28c22cf88.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.30045","authors":[{"_id":"6a4339f8763f63ca3757e99f","name":"Zhiqi Li","hidden":false},{"_id":"6a4339f8763f63ca3757e9a0","name":"Chengrui Dong","hidden":false},{"_id":"6a4339f8763f63ca3757e9a1","name":"Zhenhua Du","hidden":false},{"_id":"6a4339f8763f63ca3757e9a2","name":"Hangning Zhou","hidden":false},{"_id":"6a4339f8763f63ca3757e9a3","name":"Cong Qiu","hidden":false},{"_id":"6a4339f8763f63ca3757e9a4","name":"Hailong Qin","hidden":false},{"_id":"6a4339f8763f63ca3757e9a5","name":"Mu Yang","hidden":false},{"_id":"6a4339f8763f63ca3757e9a6","name":"Dongxu Wei","hidden":false},{"_id":"6a4339f8763f63ca3757e9a7","name":"Peidong Liu","hidden":false}],"publishedAt":"2026-06-29T00:00:00.000Z","submittedOnDailyAt":"2026-06-30T00:00:00.000Z","title":"Walking in the Implicit: Interactive World Exploration via Neural Scene Representation","submittedOnDailyBy":{"_id":"6353b63b06d707b3324279e3","avatarUrl":"/avatars/d5f2de814a5f7570ad1710b28c22cf88.svg","isPro":false,"fullname":"Zhiqi Li","user":"lzq49","type":"user","name":"lzq49"},"summary":"Interactive video generation systems for camera-controlled world exploration roll out growing sequences of latent video frames, entangling state transition with high-frequency observation synthesis. We propose Walking in the Implicit, a scene-centric paradigm that changes the rollout variable from frame latents to a fixed-length, renderable implicit state, termed Neural Implicit Scene (NIS). This factorizes interactive generation into stochastic transition of a compact scene state and deterministic pose-conditioned rendering given the sampled state. We instantiate this paradigm as NeuWorld: a transformer VAE learns locally anchored NIS from sparse posed frames, and a diffusion transformer evolves NIS conditioned on future camera trajectories and geometry-aware retrieved history. By reusing the VAE encoder as a unified conditioner, NeuWorld maps camera, reference-image, and history cues into the same NIS modality, avoiding external heterogeneous encoders. Trained from scratch on public posed-view data without pretrained video backbones or auxiliary 3D reconstructors, NeuWorld achieves strong long-horizon consistency with favorable inference efficiency.","upvotes":4,"discussionId":"6a4339f9763f63ca3757e9a8","projectPage":"https://lizhiqi49.github.io/NeuWorld","githubRepo":"https://github.com/WU-CVGL/NeuWorld","githubRepoAddedBy":"user","ai_summary":"NeuWorld enables efficient interactive video generation by representing scenes as compact neural implicit states and using a transformer VAE with diffusion transformer for trajectory-conditioned rendering.","ai_keywords":["latent video frames","implicit state","Neural Implicit Scene","transformer VAE","diffusion transformer","pose-conditioned rendering","camera trajectories","geometry-aware retrieval","VAE encoder","unified conditioner","long-horizon consistency"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":26},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6353b63b06d707b3324279e3","avatarUrl":"/avatars/d5f2de814a5f7570ad1710b28c22cf88.svg","isPro":false,"fullname":"Zhiqi Li","user":"lzq49","type":"user"},{"_id":"63ca8e060609f1def7e6548a","avatarUrl":"/avatars/1da7947840cb87d5f77c0af9ee11f9c2.svg","isPro":true,"fullname":"Yi Jung","user":"YJ-142150","type":"user"},{"_id":"69785ccead94585f418e706c","avatarUrl":"/avatars/7f8e02cb71b79eee4413e7439dbabc05.svg","isPro":false,"fullname":"zhang","user":"zhangml233","type":"user"},{"_id":"697860958cbd139e4cf141c3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/Q4z-0KXZIwEjVR3ACWwgb.png","isPro":false,"fullname":"Yi Zhi","user":"zzyai","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.30045.md","query":{}}">

Papers

arxiv:2606.30045

Walking in the Implicit: Interactive World Exploration via Neural Scene Representation

Published on Jun 29

· Submitted by

Zhiqi Li on Jun 30

Upvote

Authors:

Abstract

NeuWorld enables efficient interactive video generation by representing scenes as compact neural implicit states and using a transformer VAE with diffusion transformer for trajectory-conditioned rendering.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Interactive video generation systems for camera-controlled world exploration roll out growing sequences of latent video frames, entangling state transition with high-frequency observation synthesis. We propose Walking in the Implicit, a scene-centric paradigm that changes the rollout variable from frame latents to a fixed-length, renderable implicit state, termed Neural Implicit Scene (NIS). This factorizes interactive generation into stochastic transition of a compact scene state and deterministic pose-conditioned rendering given the sampled state. We instantiate this paradigm as NeuWorld: a transformer VAE learns locally anchored NIS from sparse posed frames, and a diffusion transformer evolves NIS conditioned on future camera trajectories and geometry-aware retrieved history. By reusing the VAE encoder as a unified conditioner, NeuWorld maps camera, reference-image, and history cues into the same NIS modality, avoiding external heterogeneous encoders. Trained from scratch on public posed-view data without pretrained video backbones or auxiliary 3D reconstructors, NeuWorld achieves strong long-horizon consistency with favorable inference efficiency.

View arXiv page View PDF Project page GitHub 26 Add to collection

Community

lzq49

Paper submitter about 21 hours ago

Our system rolls out a fixed-length, renderable Neural Implicit Scene state and renders queried observations under camera control.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.30045

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.30045 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.30045 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.30045 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Walking in the Implicit: Interactive World Exploration via Neural Scene Representation

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers