Hugging Face Daily Papers · June 30, 2026 · 5 min read

Nemotron-Labs-Diffusion-Image: Advancing Masked Discrete Diffusion for High-Resolution Image Synthesis

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

We propose Nemotron-Labs-Diffusion-Image, a state-of-the-art masked discrete diffusion model (MDM) for high-resolution text-to-image synthesis. Compared with prior work on masked image generation, Nemotron-Labs-Diffusion-Image addresses two key challenges. First, unlike continuous diffusion models which progressively refine latent representations across the entire image, standard MDMs lack self-correcting capability because discrete tokens cannot be modified once they are unmasked. Second, although increasing the vocabulary size of discrete image tokenizers improves reconstruction fidelity, it introduces optimization difficulties for generative modeling as the per-token training signal becomes increasingly sparse. To address the first challenge, Nemotron-Labs-Diffusion-Image incorporates a token-editing mechanism that enables the model to dynamically revise already-unmasked tokens during inference, similar to how a sculptor iteratively refines their work. To tackle the second challenge, we propose a Grouped Cross-Entropy (GCE) objective that assigns positive learning signals to tokens neighboring the ground truth in embedding space, thereby alleviating signal sparsity. To further improve training efficiency, we implement a custom fused operator for GCE that significantly reduces VRAM usage in large-vocabulary settings. Experimental results demonstrate that these innovations substantially improve both training efficiency and image fidelity of masked discrete image generators, achieving a score of 0.90 on GenEval, 86.9 on DPG and 10.76 of HPSv3.</p>\n","updatedAt":"2026-06-30T02:58:47.867Z","author":{"_id":"6310531914aa81e1044363ed","avatarUrl":"/avatars/ae7767e591cb7199ea2f62d2db89fc7f.svg","fullname":"Shufan Li","name":"jacklishufan","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":8,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8693538904190063},"editors":["jacklishufan"],"editorAvatarUrls":["/avatars/ae7767e591cb7199ea2f62d2db89fc7f.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.29814","authors":[{"_id":"6a43305c763f63ca3757e8ed","name":"Shufan Li","hidden":false},{"_id":"6a43305c763f63ca3757e8ee","name":"Greg Heinrich","hidden":false},{"_id":"6a43305c763f63ca3757e8ef","name":"Hanrong Ye","hidden":false},{"_id":"6a43305c763f63ca3757e8f0","name":"Yonggan Fu","hidden":false},{"_id":"6a43305c763f63ca3757e8f1","name":"Aditya Grover","hidden":false},{"_id":"6a43305c763f63ca3757e8f2","name":"Jan Kautz","hidden":false},{"_id":"6a43305c763f63ca3757e8f3","name":"Pavlo Molchanov","hidden":false}],"publishedAt":"2026-06-29T00:00:00.000Z","submittedOnDailyAt":"2026-06-30T00:00:00.000Z","title":"Nemotron-Labs-Diffusion-Image: Advancing Masked Discrete Diffusion for High-Resolution Image Synthesis","submittedOnDailyBy":{"_id":"6310531914aa81e1044363ed","avatarUrl":"/avatars/ae7767e591cb7199ea2f62d2db89fc7f.svg","isPro":false,"fullname":"Shufan Li","user":"jacklishufan","type":"user","name":"jacklishufan"},"summary":"We propose Nemotron-Labs-Diffusion-Image, a state-of-the-art masked discrete diffusion model (MDM) for high-resolution text-to-image synthesis. Compared with prior work on masked image generation, Nemotron-Labs-Diffusion-Image addresses two key challenges. First, unlike continuous diffusion models which progressively refine latent representations across the entire image, standard MDMs lack self-correcting capability because discrete tokens cannot be modified once they are unmasked. Second, although increasing the vocabulary size of discrete image tokenizers improves reconstruction fidelity, it introduces optimization difficulties for generative modeling as the per-token training signal becomes increasingly sparse. To address the first challenge, Nemotron-Labs-Diffusion-Image incorporates a token-editing mechanism that enables the model to dynamically revise already-unmasked tokens during inference, similar to how a sculptor iteratively refines their work. To tackle the second challenge, we propose a Grouped Cross-Entropy (GCE) objective that assigns positive learning signals to tokens neighboring the ground truth in embedding space, thereby alleviating signal sparsity. To further improve training efficiency, we implement a custom fused operator for GCE that significantly reduces VRAM usage in large-vocabulary settings. Experimental results demonstrate that these innovations substantially improve both training efficiency and image fidelity of masked discrete image generators, achieving a score of 0.90 on GenEval, 86.9 on DPG and 10.76 of HPSv3.","upvotes":6,"discussionId":"6a43305c763f63ca3757e8f4","ai_summary":"A masked discrete diffusion model for text-to-image synthesis that addresses limitations in token refinement and training efficiency through novel mechanisms and optimizations.","ai_keywords":["masked discrete diffusion model","text-to-image synthesis","discrete tokens","token-editing mechanism","Grouped Cross-Entropy","embedding space","VRAM usage","training efficiency","image fidelity","GenEval","DPG","HPSv3"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"60262b67268c201cdc8b7d43","name":"nvidia","fullname":"NVIDIA","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/65df9200dc3292a8983e5017/Vs5FPVCH-VZBipV3qKTuy.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6310531914aa81e1044363ed","avatarUrl":"/avatars/ae7767e591cb7199ea2f62d2db89fc7f.svg","isPro":false,"fullname":"Shufan Li","user":"jacklishufan","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"6a2da6c8ca070ee12c6e396c","avatarUrl":"/avatars/0355287dcabaa67dbc7f0b10b87451f9.svg","isPro":false,"fullname":"Joe Mama","user":"JoeMama123123123","type":"user"},{"_id":"66f8689725464a7989b75845","avatarUrl":"/avatars/43a61a528c5779103eaf5687ba44ee14.svg","isPro":false,"fullname":"Jiarui Yao","user":"FlippyDora","type":"user"},{"_id":"65c4eb7cd1dcbd30d86febec","avatarUrl":"/avatars/001c8f02e8ce794b2c21883628b2da72.svg","isPro":false,"fullname":"free-bit","user":"free-bit","type":"user"},{"_id":"69a5cba5ee290d6bb49457b8","avatarUrl":"/avatars/f80c17c13d6baf6bcd375d31efe21116.svg","isPro":true,"fullname":"Darrow O'Lykos","user":"darrowoflykos","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"60262b67268c201cdc8b7d43","name":"nvidia","fullname":"NVIDIA","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/65df9200dc3292a8983e5017/Vs5FPVCH-VZBipV3qKTuy.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.29814.md","query":{}}">

Papers

arxiv:2606.29814

Nemotron-Labs-Diffusion-Image: Advancing Masked Discrete Diffusion for High-Resolution Image Synthesis

Published on Jun 29

· Submitted by

Shufan Li on Jun 30

NVIDIA

Upvote

Authors:

Abstract

A masked discrete diffusion model for text-to-image synthesis that addresses limitations in token refinement and training efficiency through novel mechanisms and optimizations.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

View arXiv page View PDF Add to collection

Community

jacklishufan

Paper submitter about 22 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.29814

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.29814 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.29814 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.29814 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

No comments yet. Sign in and be the first to say something.

Nemotron-Labs-Diffusion-Image: Advancing Masked Discrete Diffusion for High-Resolution Image Synthesis

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 1

Discussion (0)

More from Hugging Face Daily Papers