Hugging Face Daily Papers · · 6 min read

DataEvolver: Self-Evolving Multi-Agent Data Construction for Text-Rich Image Generation

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

DataEvolver: Self-Evolving Multi-Agent Data Construction for Text-Rich Image Generation</p>\n","updatedAt":"2026-07-01T09:17:10.719Z","author":{"_id":"676a7d235303833e60a3edc5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/676a7d235303833e60a3edc5/-hmwUmlm-IYvwpH8P5AuA.jpeg","fullname":"caoshuo","name":"Thunderbolt215215","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7133522629737854},"editors":["Thunderbolt215215"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/676a7d235303833e60a3edc5/-hmwUmlm-IYvwpH8P5AuA.jpeg"],"reactions":[],"isReport":false}},{"id":"6a45c38fa6b813d09688a4f6","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":372,"isUserFollowing":false},"createdAt":"2026-07-02T01:49:03.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [MemoGen: Can Past Experience Improve Future Text-to-Image Generation?](https://huggingface.co/papers/2606.03243) (2026)\n* [EPIC: Efficient Predicate-Guided Inference-Time Control for Compositional Text-to-Image Generation](https://huggingface.co/papers/2605.11722) (2026)\n* [Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation](https://huggingface.co/papers/2606.26907) (2026)\n* [Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs](https://huggingface.co/papers/2605.30611) (2026)\n* [TextAlign: Preference Alignment for Text Rendering with Hierarchical Rewards](https://huggingface.co/papers/2605.19320) (2026)\n* [SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation](https://huggingface.co/papers/2605.08043) (2026)\n* [Large Language Models are Universal Reasoners for Visual Generation](https://huggingface.co/papers/2605.04040) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2606.03243\">MemoGen: Can Past Experience Improve Future Text-to-Image Generation?</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.11722\">EPIC: Efficient Predicate-Guided Inference-Time Control for Compositional Text-to-Image Generation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.26907\">Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.30611\">Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.19320\">TextAlign: Preference Alignment for Text Rendering with Hierarchical Rewards</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.08043\">SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.04040\">Large Language Models are Universal Reasoners for Visual Generation</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code>@librarian-bot recommend</code></p>\n","updatedAt":"2026-07-02T01:49:03.300Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":372,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6974087953567505},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.31537","authors":[{"_id":"6a44da7441f04ae4d7ad9a28","name":"Siyu Yan","hidden":false},{"_id":"6a44da7441f04ae4d7ad9a29","name":"Yizhen Gao","hidden":false},{"_id":"6a44da7441f04ae4d7ad9a2a","name":"Yilin Wang","hidden":false},{"_id":"6a44da7441f04ae4d7ad9a2b","name":"Dongxing Mao","hidden":false},{"_id":"6a44da7441f04ae4d7ad9a2c","name":"Alex Jinpeng Wang","hidden":false}],"publishedAt":"2026-06-30T00:00:00.000Z","submittedOnDailyAt":"2026-07-01T00:00:00.000Z","title":"DataEvolver: Self-Evolving Multi-Agent Data Construction for Text-Rich Image Generation","submittedOnDailyBy":{"_id":"676a7d235303833e60a3edc5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/676a7d235303833e60a3edc5/-hmwUmlm-IYvwpH8P5AuA.jpeg","isPro":false,"fullname":"caoshuo","user":"Thunderbolt215215","type":"user","name":"Thunderbolt215215"},"summary":"Text-rich image generation is one of the most challenging settings in image generation, since models must simultaneously produce visually realistic images and render legible, semantically aligned, and layout-consistent text. Existing data pipelines usually follow a static crawl-filter-freeze paradigm. They collect candidate samples, filter them once, and freeze the accepted data for training. However, rejected samples are usually discarded, although they often contain useful failure signals such as OCR errors and semantic mismatches. As a result, later construction rounds may repeat the same failure modes. To address these limitations, we propose DataEvolver, a self-evolving multi-agent framework for text-rich image data construction. DataEvolver treats data construction as feedback-driven construction policy evolution. A Retriever collects candidate samples, a Verifier assigns quality scores and rejection causes, a Critic summarizes round-level feedback into semantic feedback, and a Generator completes under-covered regions through targeted synthesis. The updated feedback memory then guides the next construction round. Experiments on text-rich image generation benchmarks show that DataEvolver produces more useful training data than fixed-dataset baselines under matched data budgets. At the 0.75M scale on PixArt-alpha, DataEvolver improves OCR-F1 over the strongest baseline by 85.3 percent on TextScenesHQ and 35.3 percent on LongTextBench. The improvements are consistent across both evaluated benchmarks and also transfer to Show-o2, indicating that the benefit of DataEvolver is not tied to a single downstream generator. These results suggest that rejected samples can provide actionable feedback for improving text-rich image data construction.","upvotes":16,"discussionId":"6a44da7441f04ae4d7ad9a2d","ai_summary":"DataEvolver is a self-evolving multi-agent framework that improves text-rich image generation by leveraging feedback from rejected samples to iteratively enhance data quality.","ai_keywords":["text-rich image generation","data construction","feedback-driven evolution","multi-agent framework","OCR-F1","semantic feedback","data budget","downstream generator"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct"},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"670fd9845a840c8eaba8d70a","avatarUrl":"/avatars/3714191fda634387ad4e94d96a7cf4d0.svg","isPro":false,"fullname":"Siyu Yan","user":"SiyuYanYan","type":"user"},{"_id":"676a7d235303833e60a3edc5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/676a7d235303833e60a3edc5/-hmwUmlm-IYvwpH8P5AuA.jpeg","isPro":false,"fullname":"caoshuo","user":"Thunderbolt215215","type":"user"},{"_id":"65116f77e15da7d6cbe2edc9","avatarUrl":"/avatars/b2ad749b57d082f5a2bded70aeb007d5.svg","isPro":false,"fullname":"tailuo","user":"chenhaoji","type":"user"},{"_id":"68b2a4d2617991e304a19b64","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68b2a4d2617991e304a19b64/83m0OnfT6WYlZPoqE7tdi.jpeg","isPro":false,"fullname":"Siyu ZHANG","user":"Luneishevy","type":"user"},{"_id":"68b818494e58002295fdb5a3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68b818494e58002295fdb5a3/4UAaeV7sFgJkKQLMp0iv_.jpeg","isPro":false,"fullname":"Jason Cai","user":"WaterCoFire","type":"user"},{"_id":"67a813554cf6257be30e92c6","avatarUrl":"/avatars/4628475b809805056d20d84aea1fd3e7.svg","isPro":false,"fullname":"LauvAri","user":"LauvAri","type":"user"},{"_id":"65818ecd20e57e0ebf5d90e9","avatarUrl":"/avatars/bd80a92013d930acf3fbc946d3a2bf67.svg","isPro":false,"fullname":"Richard Lee","user":"lixin4sky","type":"user"},{"_id":"63ac5701c21e60a3e9b58aa7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63ac5701c21e60a3e9b58aa7/g6EX7diOpuA94R2ab-rZC.png","isPro":true,"fullname":"Dipankar Sarkar","user":"dipankarsarkar","type":"user"},{"_id":"686dcb1cceda9577313fdfde","avatarUrl":"/avatars/9f7803d868fd5b5b5e5e4063d4e516d5.svg","isPro":false,"fullname":"suif","user":"suif11","type":"user"},{"_id":"68873d32b3e1116515ba63cd","avatarUrl":"/avatars/34c6186256629eb4d42a5c8b3856c1d1.svg","isPro":false,"fullname":"Cedar Zeng","user":"Cedar1","type":"user"},{"_id":"65aa6ae215102fd65968615d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65aa6ae215102fd65968615d/Zs3ZXXblHZLEgVlawQV0p.jpeg","isPro":false,"fullname":"Yongheng Zhang","user":"BRZ911","type":"user"},{"_id":"67af3161e98bfe8c28583a4f","avatarUrl":"/avatars/0a5b6c9e5fa904370d72638e3183f932.svg","isPro":false,"fullname":"Yifan Su","user":"Leslie04","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.31537.md","query":{}}">
Papers
arxiv:2606.31537

DataEvolver: Self-Evolving Multi-Agent Data Construction for Text-Rich Image Generation

Published on Jun 30
· Submitted by
caoshuo
on Jul 1
Authors:
,
,
,
,

Abstract

DataEvolver is a self-evolving multi-agent framework that improves text-rich image generation by leveraging feedback from rejected samples to iteratively enhance data quality.

Text-rich image generation is one of the most challenging settings in image generation, since models must simultaneously produce visually realistic images and render legible, semantically aligned, and layout-consistent text. Existing data pipelines usually follow a static crawl-filter-freeze paradigm. They collect candidate samples, filter them once, and freeze the accepted data for training. However, rejected samples are usually discarded, although they often contain useful failure signals such as OCR errors and semantic mismatches. As a result, later construction rounds may repeat the same failure modes. To address these limitations, we propose DataEvolver, a self-evolving multi-agent framework for text-rich image data construction. DataEvolver treats data construction as feedback-driven construction policy evolution. A Retriever collects candidate samples, a Verifier assigns quality scores and rejection causes, a Critic summarizes round-level feedback into semantic feedback, and a Generator completes under-covered regions through targeted synthesis. The updated feedback memory then guides the next construction round. Experiments on text-rich image generation benchmarks show that DataEvolver produces more useful training data than fixed-dataset baselines under matched data budgets. At the 0.75M scale on PixArt-alpha, DataEvolver improves OCR-F1 over the strongest baseline by 85.3 percent on TextScenesHQ and 35.3 percent on LongTextBench. The improvements are consistent across both evaluated benchmarks and also transfer to Show-o2, indicating that the benefit of DataEvolver is not tied to a single downstream generator. These results suggest that rejected samples can provide actionable feedback for improving text-rich image data construction.

Community

DataEvolver: Self-Evolving Multi-Agent Data Construction for Text-Rich Image Generation

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.31537
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.31537 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.31537 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.31537 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers