We present a zero-shot, training-free and optimization-free framework for generating 360 panoramic images and videos by directly injecting spherical priors into pre-trained diffusion transformers. Existing methods either rely on costly fine-tuning on scarce panoramic data that limits generalization, or leverage multi-step optimization that incurs prohibitive inference latency. We observe that contemporary generative models natively exhibit some panoramic priors from large-scale training. However, these emergent capabilities are insufficient, as the models fundamentally fail to satisfy the rigorous topological constraints imposed by equirectangular projection (ERP). We introduce a zero-shot and optimization-free approach that resolves these constraints at inference time. Spherical RoPE replaces standard rotary position embeddings: low-frequency channels are re-parameterized as 3D Cartesian coordinates to natively encode the spherical manifold, while high-frequency channels are harmonically quantized to enforce exact periodicity. Coupled with complementary Semantic Distortion classifier-free guidance (CFG) that explicitly steers geometry, we avoid retraining and inherit the full creative breadth of state-of-the-art models. Our approach generalizes across diverse backbones and 360 generation modalities. We demonstrate this across text-to-panorama using Flux.1, Flux.2, and LTX-Video backbones, achieving competitive performance against baselines, all while remaining training-free.</p>\n","updatedAt":"2026-07-01T19:54:12.773Z","author":{"_id":"6318585a70b013d66093c9f5","avatarUrl":"/avatars/25a3d2693e7427f2ce9111cc9bc3bf8c.svg","fullname":"Or Hirschschorn","name":"orhir","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8876637816429138},"editors":["orhir"],"editorAvatarUrls":["/avatars/25a3d2693e7427f2ce9111cc9bc3bf8c.svg"],"reactions":[],"isReport":false}},{"id":"6a45c339e6a31bbaf19618bf","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":372,"isUserFollowing":false},"createdAt":"2026-07-02T01:47:37.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [SHERPA: Seam-aware Harmonized ERP Adaptation for Open-Domain 360$^\\circ$ Panorama Generation](https://huggingface.co/papers/2606.12213) (2026)\n* [DTG-Restore: Training-Free Diffusion Refinement for Generative Video Super-Resolution](https://huggingface.co/papers/2605.30431) (2026)\n* [PanoWorld: Geometry-Consistent Panoramic Video World Modeling](https://huggingface.co/papers/2605.15391) (2026)\n* [COLLAR: Cascaded Object-Level Latent Refinement for High-Fidelity Conditional Generation](https://huggingface.co/papers/2606.00954) (2026)\n* [Real2SAM2Real: Generative 3D Caches as Complementary Context for Video Diffusion](https://huggingface.co/papers/2606.00299) (2026)\n* [TrioPose: Native Triple-Stream Diffusion Transformers for Pose-Guided Text-to-Image Generation](https://huggingface.co/papers/2606.07053) (2026)\n* [GeoEdit: Geometry-Aware Object Editing via Dual-Branch Denoising](https://huggingface.co/papers/2606.30003) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2606.12213\">SHERPA: Seam-aware Harmonized ERP Adaptation for Open-Domain 360$^\\circ$ Panorama Generation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.30431\">DTG-Restore: Training-Free Diffusion Refinement for Generative Video Super-Resolution</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.15391\">PanoWorld: Geometry-Consistent Panoramic Video World Modeling</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.00954\">COLLAR: Cascaded Object-Level Latent Refinement for High-Fidelity Conditional Generation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.00299\">Real2SAM2Real: Generative 3D Caches as Complementary Context for Video Diffusion</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.07053\">TrioPose: Native Triple-Stream Diffusion Transformers for Pose-Guided Text-to-Image Generation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.30003\">GeoEdit: Geometry-Aware Object Editing via Dual-Branch Denoising</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code>@librarian-bot recommend</code></p>\n","updatedAt":"2026-07-02T01:47:37.206Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":372,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6672369837760925},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.32033","authors":[{"_id":"6a456f9e4f1dd35e48fb8dd1","name":"Or Hirschorn","hidden":false},{"_id":"6a456f9e4f1dd35e48fb8dd2","name":"Aaron Olender","hidden":false},{"_id":"6a456f9e4f1dd35e48fb8dd3","name":"Eli Alshan","hidden":false},{"_id":"6a456f9e4f1dd35e48fb8dd4","name":"Ianir Ideses","hidden":false},{"_id":"6a456f9e4f1dd35e48fb8dd5","name":"Lior Fritz","hidden":false},{"_id":"6a456f9e4f1dd35e48fb8dd6","name":"Sagie Benaim","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/6318585a70b013d66093c9f5/EJFodaGVdlg5wvaWbX8Lp.mp4"],"publishedAt":"2026-06-30T00:00:00.000Z","submittedOnDailyAt":"2026-07-01T00:00:00.000Z","title":"SpheRoPE: Zero-Shot Optimization-Free 360 Panorama Generation with Spherical RoPE","submittedOnDailyBy":{"_id":"6318585a70b013d66093c9f5","avatarUrl":"/avatars/25a3d2693e7427f2ce9111cc9bc3bf8c.svg","isPro":false,"fullname":"Or Hirschschorn","user":"orhir","type":"user","name":"orhir"},"summary":"We present a zero-shot, training-free and optimization-free framework for generating 360 panoramic images and videos by directly injecting spherical priors into pre-trained diffusion transformers. Existing methods either rely on costly fine-tuning on scarce panoramic data that limits generalization, or leverage multi-step optimization that incurs prohibitive inference latency. We observe that contemporary generative models natively exhibit some panoramic priors from large-scale training. However, these emergent capabilities are insufficient, as the models fundamentally fail to satisfy the rigorous topological constraints imposed by equirectangular projection (ERP). We introduce a zero-shot and optimization-free approach that resolves these constraints at inference time. Spherical RoPE replaces standard rotary position embeddings: low-frequency channels are re-parameterized as 3D Cartesian coordinates to natively encode the spherical manifold, while high-frequency channels are harmonically quantized to enforce exact periodicity. Coupled with complementary Semantic Distortion classifier-free guidance (CFG) that explicitly steers geometry, we avoid retraining and inherit the full creative breadth of state-of-the-art models. Our approach generalizes across diverse backbones and 360 generation modalities. We demonstrate this across text-to-panorama using Flux.1, Flux.2, and LTX-Video backbones, achieving competitive performance against baselines, all while remaining training-free. Project page: https://orhir.github.io/SpheRoPE","upvotes":1,"discussionId":"6a456f9e4f1dd35e48fb8dd7","projectPage":"https://orhir.github.io/SpheRoPE","githubRepo":"https://github.com/orhir/SpheRoPE","githubRepoAddedBy":"user","ai_summary":"A novel zero-shot framework injects spherical priors into pre-trained diffusion transformers for 360 panoramic generation, using spherical RoPE and semantic distortion guidance to overcome topological constraints without training or optimization.","ai_keywords":["diffusion transformers","spherical priors","equirectangular projection","spherical RoPE","rotary position embeddings","3D Cartesian coordinates","periodicity","classifier-free guidance","semantic distortion","panoramic generation","zero-shot","optimization-free","training-free"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":6},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6318585a70b013d66093c9f5","avatarUrl":"/avatars/25a3d2693e7427f2ce9111cc9bc3bf8c.svg","isPro":false,"fullname":"Or Hirschschorn","user":"orhir","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.32033.md","query":{}}">
SpheRoPE: Zero-Shot Optimization-Free 360 Panorama Generation with Spherical RoPE
Abstract
A novel zero-shot framework injects spherical priors into pre-trained diffusion transformers for 360 panoramic generation, using spherical RoPE and semantic distortion guidance to overcome topological constraints without training or optimization.
We present a zero-shot, training-free and optimization-free framework for generating 360 panoramic images and videos by directly injecting spherical priors into pre-trained diffusion transformers. Existing methods either rely on costly fine-tuning on scarce panoramic data that limits generalization, or leverage multi-step optimization that incurs prohibitive inference latency. We observe that contemporary generative models natively exhibit some panoramic priors from large-scale training. However, these emergent capabilities are insufficient, as the models fundamentally fail to satisfy the rigorous topological constraints imposed by equirectangular projection (ERP). We introduce a zero-shot and optimization-free approach that resolves these constraints at inference time. Spherical RoPE replaces standard rotary position embeddings: low-frequency channels are re-parameterized as 3D Cartesian coordinates to natively encode the spherical manifold, while high-frequency channels are harmonically quantized to enforce exact periodicity. Coupled with complementary Semantic Distortion classifier-free guidance (CFG) that explicitly steers geometry, we avoid retraining and inherit the full creative breadth of state-of-the-art models. Our approach generalizes across diverse backbones and 360 generation modalities. We demonstrate this across text-to-panorama using Flux.1, Flux.2, and LTX-Video backbones, achieving competitive performance against baselines, all while remaining training-free. Project page: https://orhir.github.io/SpheRoPE
Community
We present a zero-shot, training-free and optimization-free framework for generating 360 panoramic images and videos by directly injecting spherical priors into pre-trained diffusion transformers. Existing methods either rely on costly fine-tuning on scarce panoramic data that limits generalization, or leverage multi-step optimization that incurs prohibitive inference latency. We observe that contemporary generative models natively exhibit some panoramic priors from large-scale training. However, these emergent capabilities are insufficient, as the models fundamentally fail to satisfy the rigorous topological constraints imposed by equirectangular projection (ERP). We introduce a zero-shot and optimization-free approach that resolves these constraints at inference time. Spherical RoPE replaces standard rotary position embeddings: low-frequency channels are re-parameterized as 3D Cartesian coordinates to natively encode the spherical manifold, while high-frequency channels are harmonically quantized to enforce exact periodicity. Coupled with complementary Semantic Distortion classifier-free guidance (CFG) that explicitly steers geometry, we avoid retraining and inherit the full creative breadth of state-of-the-art models. Our approach generalizes across diverse backbones and 360 generation modalities. We demonstrate this across text-to-panorama using Flux.1, Flux.2, and LTX-Video backbones, achieving competitive performance against baselines, all while remaining training-free.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.32033 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.32033 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.32033 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.