Hugging Face Daily Papers · June 9, 2026 · 6 min read

Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

This paper explores agentic 3D spatial understanding, i.e., MLLM agents performing 3D reasoning through tool use. Existing methods often misuse tools and exhibit biased tool preferences under 3D scenario, leaving the agentic paradigm with only marginal gains over non-agentic strategies. We reveal that 3D spatial reasoning tasks are heterogeneous across scenes, while these agents apply a uniform tool-use strategy to all scenes rather than selecting tools according to the specific scene and task. To address this, we propose Skill-3D, a framework that learns self-evolving scene-aware skills. Specifically, Skill-3D identifies the task scene and records the agent’s tool-use trajectory into a Scene Memory, where successful trajectories from similar scenes are aggregated and distilled into a reusable scene-aware skill, with failed ones attached to the skill as lessons. During training, once a similar scene recurs, the corresponding skill is injected to guide the agent, producing new trajectories whose successes and failures further refine the skill, forming a loop in which the memory and the skill library co-evolve. Experiments show that Skill-3D substantially improves tool utilization in 3D spatial reasoning (from 39% to 78% on VSI-Bench), driving the agent toward correct and sufficient tool use. For instance, it improves Gemini-3-Flash by 67% on MMSI-Bench. Furthermore, we conduct agentic post-training over skill-guided trajectories, which boosts Qwen3-VL-8B by 43% on VSI-Bench.</p>\n","updatedAt":"2026-06-09T12:44:49.417Z","author":{"_id":"67742e63ee8f1ebe45a3d757","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/m5tf_SE2yIrQfxXGjn3Cl.png","fullname":"Haoyuan Li","name":"lhy-zju","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9079434275627136},"editors":["lhy-zju"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/m5tf_SE2yIrQfxXGjn3Cl.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.07436","authors":[{"_id":"6a27ade96dde1c5ef75bd185","user":{"_id":"67742e63ee8f1ebe45a3d757","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/m5tf_SE2yIrQfxXGjn3Cl.png","isPro":false,"fullname":"Haoyuan Li","user":"lhy-zju","type":"user","name":"lhy-zju"},"name":"Haoyuan Li","status":"claimed_verified","statusLastChangedAt":"2026-06-09T12:41:09.931Z","hidden":false},{"_id":"6a27ade96dde1c5ef75bd186","name":"Zhengdong Hu","hidden":false},{"_id":"6a27ade96dde1c5ef75bd187","name":"Jun Wang","hidden":false},{"_id":"6a27ade96dde1c5ef75bd188","name":"Hehe Fan","hidden":false},{"_id":"6a27ade96dde1c5ef75bd189","name":"Yi Yang","hidden":false}],"publishedAt":"2026-06-05T16:33:44.000Z","submittedOnDailyAt":"2026-06-09T00:00:00.000Z","title":"Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning","submittedOnDailyBy":{"_id":"67742e63ee8f1ebe45a3d757","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/m5tf_SE2yIrQfxXGjn3Cl.png","isPro":false,"fullname":"Haoyuan Li","user":"lhy-zju","type":"user","name":"lhy-zju"},"summary":"This paper explores agentic 3D spatial understanding, i.e., MLLM agents performing 3D reasoning through tool use. Existing methods often misuse tools and exhibit biased tool preferences under 3D scenarios, leaving the agentic paradigm with only marginal gains over non-agentic strategies. We reveal that 3D spatial reasoning tasks are heterogeneous across scenes, while these agents apply a uniform tool-use strategy to all scenes rather than selecting tools according to the specific scene and task. To address this, we propose Skill-3D, a framework that learns self-evolving scene-aware skills. Specifically, Skill-3D identifies the task scene and records the agent's tool-use trajectory into a Scene Memory, where successful trajectories from similar scenes are aggregated and distilled into a reusable scene-aware skill, with failed ones attached to the skill as lessons. During training, once a similar scene recurs, the corresponding skill is injected to guide the agent, producing new trajectories whose successes and failures further refine the skill, forming a loop in which the memory and the skill library co-evolve. Experiments show that Skill-3D substantially improves tool utilization in 3D spatial reasoning (from 39% to 78% on VSI-Bench), driving the agent toward correct and sufficient tool use. For instance, it improves Gemini-3-Flash by 67% on MMSI-Bench. Furthermore, we conduct agentic post-training over skill-guided trajectories, which boosts Qwen3-VL-8B by 43% on VSI-Bench.","upvotes":1,"discussionId":"6a27adea6dde1c5ef75bd18a","projectPage":"https://skill-3d.github.io/","githubRepo":"https://github.com/skill-3d/Skill-3D","githubRepoAddedBy":"user","ai_summary":"Skill-3D framework enables agents to learn scene-aware skills through self-evolving memory and skill libraries, improving tool utilization in 3D spatial reasoning tasks.","ai_keywords":["agentic 3D spatial understanding","MLLM agents","tool use","3D spatial reasoning","Scene Memory","skill library","self-evolving","trajectory recording","scene-aware skills","agentic post-training"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":6,"organization":{"_id":"61bac2af530e5c78d7b99667","name":"zju","fullname":"Zhejiang University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/5e1058e9fcf41d740b69966d/7G1xjlxwCdMEmKcxNR0n5.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"67742e63ee8f1ebe45a3d757","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/m5tf_SE2yIrQfxXGjn3Cl.png","isPro":false,"fullname":"Haoyuan Li","user":"lhy-zju","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"61bac2af530e5c78d7b99667","name":"zju","fullname":"Zhejiang University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/5e1058e9fcf41d740b69966d/7G1xjlxwCdMEmKcxNR0n5.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.07436.md"}">

Papers

arxiv:2606.07436

Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning

Published on Jun 5

· Submitted by

Haoyuan Li on Jun 9

Zhejiang University

Upvote

Authors:

Haoyuan Li ,

Abstract

Skill-3D framework enables agents to learn scene-aware skills through self-evolving memory and skill libraries, improving tool utilization in 3D spatial reasoning tasks.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

View arXiv page View PDF Project page GitHub 6 Add to collection

Community

lhy-zju

Paper author Paper submitter about 6 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.07436

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.07436 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.07436 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.07436 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers