Hugging Face Daily Papers · July 1, 2026 · 7 min read

PhotoQuilt: Training-Free Arbitrary-Resolution Photomosaics via Bootstrapped Tiled Denoising

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

TLDR: PhotoQuilt is a training-free way to make photomosaics, big images where each tile is a convincing little picture on its own, yet together they form a coherent scene. It sketches the whole layout at low resolution first, then upscales and denoises each tile on its own, so tiles get sharp detail while staying anchored to the global structure. Because tiles are handled separately, it scales to huge canvases cheaply and beats prior methods on both the big picture and the fine detail.\n<a href=\"https://cdn-uploads.huggingface.co/production/uploads/672162a42dfd290c4647160d/5vv0ZDGGUZFO5ENaBk6ij.jpeg\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/672162a42dfd290c4647160d/5vv0ZDGGUZFO5ENaBk6ij.jpeg\" alt=\"dist-bird-preview\"></a>\n","updatedAt":"2026-07-01T04:17:28.618Z","author":{"_id":"672162a42dfd290c4647160d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/672162a42dfd290c4647160d/6rCjhUAHfSow8vEbl4JCp.jpeg","fullname":"Javad Rajabi","name":"Nova2001","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8557606935501099},"editors":["Nova2001"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/672162a42dfd290c4647160d/6rCjhUAHfSow8vEbl4JCp.jpeg"],"reactions":[],"isReport":false}},{"id":"6a455577e6a31bbaf18ab094","author":{"_id":"68bb0e42a14418015d344509","avatarUrl":"/avatars/35982535035b4ad618c88a7c89fe1d37.svg","fullname":"NAMENAME","name":"VLAD545645645","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false},"createdAt":"2026-07-01T17:59:19.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Interesting work!!\nBut now we need a project with the opposite effect—one where all the Tileds are consistent and make sense in relation to each other, for editing ultra-high-resolution images! =)","html":"Interesting work!! But now we need a project with the opposite effect—one where all the Tileds are consistent and make sense in relation to each other, for editing ultra-high-resolution images! =)\n","updatedAt":"2026-07-01T17:59:19.011Z","author":{"_id":"68bb0e42a14418015d344509","avatarUrl":"/avatars/35982535035b4ad618c88a7c89fe1d37.svg","fullname":"NAMENAME","name":"VLAD545645645","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.908990204334259},"editors":["VLAD545645645"],"editorAvatarUrls":["/avatars/35982535035b4ad618c88a7c89fe1d37.svg"],"reactions":[{"reaction":"🔥","users":["kooroshrh"],"count":1}],"isReport":false}},{"id":"6a45c371f37335dc0da9121b","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":372,"isUserFollowing":false},"createdAt":"2026-07-02T01:48:33.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [VICR: Visual In-Context Restoration for Real-World Image Super-Resolution](https://huggingface.co/papers/2606.00704) (2026)\n* [RefDecoder: Enhancing Visual Generation with Conditional Video Decoding](https://huggingface.co/papers/2605.15196) (2026)\n* [SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional Segment-wise Generation](https://huggingface.co/papers/2605.06356) (2026)\n* [RaPD: Resolution-Agnostic Pixel Diffusion via Semantics-Enriched Implicit Representations](https://huggingface.co/papers/2605.15908) (2026)\n* [HierEdit: Region-Aware Hierarchical Diffusion for Efficient High-Resolution Editing](https://huggingface.co/papers/2605.17294) (2026)\n* [DreamSR: Towards Ultra-High-Resolution Image Super-Resolution via a Receptive-Field Enhanced Diffusion Transformer](https://huggingface.co/papers/2605.15682) (2026)\n* [Splatshot: 3D Face Avatar Generation from a Single Unconstrained Photo](https://huggingface.co/papers/2606.01493) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. \nThe following papers were recommended by the Semantic Scholar API \n<ul>\n<li><a href=\"https://huggingface.co/papers/2606.00704\">VICR: Visual In-Context Restoration for Real-World Image Super-Resolution</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.15196\">RefDecoder: Enhancing Visual Generation with Conditional Video Decoding</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.06356\">SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional Segment-wise Generation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.15908\">RaPD: Resolution-Agnostic Pixel Diffusion via Semantics-Enriched Implicit Representations</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.17294\">HierEdit: Region-Aware Hierarchical Diffusion for Efficient High-Resolution Editing</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.15682\">DreamSR: Towards Ultra-High-Resolution Image Super-Resolution via a Receptive-Field Enhanced Diffusion Transformer</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.01493\">Splatshot: 3D Face Avatar Generation from a Single Unconstrained Photo</a> (2026)</li>\n</ul>\n Please give a thumbs up to this comment if you found it helpful!\n If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code>@librarian-bot recommend</code>\n","updatedAt":"2026-07-02T01:48:33.675Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":372,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6712948083877563},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.30968","authors":[{"_id":"6a44933c41f04ae4d7ad9825","user":{"_id":"6966a829b6dd7c2164d9f295","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6966a829b6dd7c2164d9f295/UbhDqmfltzcAh9KMj1j5X.jpeg","isPro":false,"fullname":"Koorosh Roohi","user":"kooroshrh","type":"user","name":"kooroshrh"},"name":"Koorosh Roohi","status":"claimed_verified","statusLastChangedAt":"2026-07-01T08:44:13.905Z","hidden":false},{"_id":"6a44933c41f04ae4d7ad9826","user":{"_id":"672162a42dfd290c4647160d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/672162a42dfd290c4647160d/6rCjhUAHfSow8vEbl4JCp.jpeg","isPro":false,"fullname":"Javad Rajabi","user":"Nova2001","type":"user","name":"Nova2001"},"name":"Javad Rajabi","status":"claimed_verified","statusLastChangedAt":"2026-07-01T08:44:16.036Z","hidden":false},{"_id":"6a44933c41f04ae4d7ad9827","name":"Andrew Fleet","hidden":false},{"_id":"6a44933c41f04ae4d7ad9828","name":"Babak Taati","hidden":false}],"publishedAt":"2026-06-29T00:00:00.000Z","submittedOnDailyAt":"2026-07-01T00:00:00.000Z","title":"PhotoQuilt: Training-Free Arbitrary-Resolution Photomosaics via Bootstrapped Tiled Denoising","submittedOnDailyBy":{"_id":"672162a42dfd290c4647160d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/672162a42dfd290c4647160d/6rCjhUAHfSow8vEbl4JCp.jpeg","isPro":false,"fullname":"Javad Rajabi","user":"Nova2001","type":"user","name":"Nova2001"},"summary":"Photomosaics are large images whose local regions are seen as independent tiles while their overall arrangement forms a coherent scene. Generating them at high resolution, with every tile convincing in its own right, is computationally expensive, since the canvas must hold many detailed tiles at once. We present PhotoQuilt, a training-free framework that generates photomosaics at arbitrary resolution. Diffusion models struggle to satisfy both scales at once, as direct high-resolution generation is costly and tends toward one smooth image rather than a mosaic, while patch-based tiling keeps local detail but loses global structure. PhotoQuilt resolves this with a bootstrapped tiled denoising procedure. We first produce a global composition at low resolution to fix the layout, then upscale it in latent space and re-inject noise to restore generative capacity. Denoising proceeds within fixed tiles, so each forms its own image while the shared global structure holds them in one layout. Because tile generation is handled separately, PhotoQuilt scales to large canvases without quadratic attention cost. Experiments show that PhotoQuilt outperforms current baselines on both global structure and local realism.","upvotes":13,"discussionId":"6a44933d41f04ae4d7ad9829","projectPage":"https://kooroshrh.github.io/photo-quilt/","ai_summary":"PhotoQuilt is a training-free framework that generates high-resolution photomosaics by combining global layout composition with separate tile generation in latent space, overcoming limitations of diffusion models in balancing local detail and global structure.","ai_keywords":["diffusion models","photomosaics","latent space","denoising procedure","global composition","local realism","tile generation","bootstrapped approach"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"62c5000b4d3cf26ce7c62822","name":"uoft","fullname":"University of Toronto","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1657077766523-62c4ff85cb7033fd49b7a559.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"672162a42dfd290c4647160d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/672162a42dfd290c4647160d/6rCjhUAHfSow8vEbl4JCp.jpeg","isPro":false,"fullname":"Javad Rajabi","user":"Nova2001","type":"user"},{"_id":"6597a21a92afb150dd0bef11","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/J2i1O2eqBzcy48u_6m0Ws.jpeg","isPro":false,"fullname":"TavakoliAfshari","user":"SeyedMatin","type":"user"},{"_id":"6a10aee8363998e17eb9d605","avatarUrl":"/avatars/0d8507d2a81cadc4955b728d2867b52d.svg","isPro":false,"fullname":"Shayan Ghalehdar","user":"hsgssisv","type":"user"},{"_id":"65fbfa858d815d64d9cfd71c","avatarUrl":"/avatars/8a8c350603c48db726fa90b6862dd826.svg","isPro":false,"fullname":"Aria Mostajeran","user":"aria-mstj","type":"user"},{"_id":"6a4546bd9aa1c54fe5dec861","avatarUrl":"/avatars/3e2664eed62d83d9843d9c8f6d5d832b.svg","isPro":false,"fullname":"Amirhossein Sorour","user":"amirhsr99","type":"user"},{"_id":"6a45479358da51b72f5bcbcf","avatarUrl":"/avatars/66730293dbfa1ec5c6d5b4323d607df6.svg","isPro":false,"fullname":"Aylar Sedaei","user":"aylarsedaei","type":"user"},{"_id":"6966a829b6dd7c2164d9f295","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6966a829b6dd7c2164d9f295/UbhDqmfltzcAh9KMj1j5X.jpeg","isPro":false,"fullname":"Koorosh Roohi","user":"kooroshrh","type":"user"},{"_id":"68bb0e42a14418015d344509","avatarUrl":"/avatars/35982535035b4ad618c88a7c89fe1d37.svg","isPro":false,"fullname":"NAMENAME","user":"VLAD545645645","type":"user"},{"_id":"66e86acc23ac2b54985c73c6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/HwvFIKRHZk9kRcii1_Mgl.png","isPro":false,"fullname":"Kasra Mojallal","user":"kasramojallal","type":"user"},{"_id":"6375965008eebfdd0a399891","avatarUrl":"/avatars/946768f40a18793ced82f09a1de47952.svg","isPro":false,"fullname":"Soroush Mehraban","user":"SoroushMehraban","type":"user"},{"_id":"6a4572fe9c0726a3c86f8177","avatarUrl":"/avatars/f988c97e99269b99c8483b5749293abb.svg","isPro":false,"fullname":"Parsa Enami","user":"parsaenami","type":"user"},{"_id":"6a109b6cb9843bf162e0c2db","avatarUrl":"/avatars/a2d56c564edfd37c7b22e355b62631b6.svg","isPro":false,"fullname":"Kaviani","user":"BaharKvn","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"62c5000b4d3cf26ce7c62822","name":"uoft","fullname":"University of Toronto","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1657077766523-62c4ff85cb7033fd49b7a559.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.30968.md","query":{}}">

Papers

arxiv:2606.30968

PhotoQuilt: Training-Free Arbitrary-Resolution Photomosaics via Bootstrapped Tiled Denoising

Published on Jun 29

· Submitted by

Javad Rajabi on Jul 1

University of Toronto

Upvote

Authors:

Koorosh Roohi ,

Javad Rajabi ,

Abstract

PhotoQuilt is a training-free framework that generates high-resolution photomosaics by combining global layout composition with separate tile generation in latent space, overcoming limitations of diffusion models in balancing local detail and global structure.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Photomosaics are large images whose local regions are seen as independent tiles while their overall arrangement forms a coherent scene. Generating them at high resolution, with every tile convincing in its own right, is computationally expensive, since the canvas must hold many detailed tiles at once. We present PhotoQuilt, a training-free framework that generates photomosaics at arbitrary resolution. Diffusion models struggle to satisfy both scales at once, as direct high-resolution generation is costly and tends toward one smooth image rather than a mosaic, while patch-based tiling keeps local detail but loses global structure. PhotoQuilt resolves this with a bootstrapped tiled denoising procedure. We first produce a global composition at low resolution to fix the layout, then upscale it in latent space and re-inject noise to restore generative capacity. Denoising proceeds within fixed tiles, so each forms its own image while the shared global structure holds them in one layout. Because tile generation is handled separately, PhotoQuilt scales to large canvases without quadratic attention cost. Experiments show that PhotoQuilt outperforms current baselines on both global structure and local realism.

View arXiv page View PDF Project page Add to collection

Community

Nova2001

Paper author Paper submitter about 22 hours ago

VLAD545645645

about 8 hours ago

Interesting work!!
But now we need a project with the opposite effect—one where all the Tileds are consistent and make sense in relation to each other, for editing ultra-high-resolution images! =)