Hugging Face Daily Papers · June 9, 2026 · 6 min read

SDR: Set-Distance Rewards for Radiology Report Generation

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Reinforcement learning with verifiable rewards has rapidly advanced reasoning in vision--language models. However, for chest X-ray report generation, the standard rewards (i.e. exact-match accuracy and step-level processes) are incompatible because the reports consist of unordered and orthogonal findings, rather than a causal reasoning chain. We address this gap with a set-based view: each report is split into sentences and embedded by a frozen sentence transformer, yielding unordered embedding sets. We propose the use of set-to-set distances between generated and reference embeddings as continuous, permutation-invariant rewards. Across two datasets and three vision--language models (Qwen3-VL-2B/4B, Gemma3-4B), post-training with set-to-set distance based rewards via GRPO consistently outperforms supervised fine-tuning and exact-match GRPO on all headline metrics (BERTScore, RadGraph F1 and CheXbert F1 by average 6.80%, 7.82% and 4.45% relative improvements respectively). The same set distances also enable test-time best-of-N selection: scoring candidates by their distance to training-report embeddings outperforms random selection on our trained models as well as three closed-source LLMs (Mistral-Small, Gemini-2.5 Flash-Lite, GPT-4o-mini) with on average 16.4% relative improvement on BERTScore. Used as a streaming signal, they support a more efficient form of test-time scaling: pruning low-scoring candidates mid-generation reduces generated tokens by over 50% while preserving the Findings quality of full best-of-N selection. Together these results establish set-distance rewards as a unified signal for both post-training and test-time scaling in chest X-ray report generation. Our code is publicly available.</p>\n","updatedAt":"2026-06-09T17:03:36.171Z","author":{"_id":"666ddefe83571a7a05af7870","avatarUrl":"/avatars/74b88573973b5508e47f5af4044b14a6.svg","fullname":"Halil Ibrahim Gulluk","name":"gulluk","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9307788610458374},"editors":["gulluk"],"editorAvatarUrls":["/avatars/74b88573973b5508e47f5af4044b14a6.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.00440","authors":[{"_id":"6a202b6d15100c5272a84257","name":"Halil Ibrahim Gulluk","hidden":false},{"_id":"6a202b6d15100c5272a84258","name":"Max Van Puyvelde","hidden":false},{"_id":"6a202b6d15100c5272a84259","name":"Wim Van Criekinge","hidden":false},{"_id":"6a202b6d15100c5272a8425a","name":"Olivier Gevaert","hidden":false}],"publishedAt":"2026-05-30T00:00:00.000Z","submittedOnDailyAt":"2026-06-09T00:00:00.000Z","title":"SDR: Set-Distance Rewards for Radiology Report Generation","submittedOnDailyBy":{"_id":"666ddefe83571a7a05af7870","avatarUrl":"/avatars/74b88573973b5508e47f5af4044b14a6.svg","isPro":false,"fullname":"Halil Ibrahim Gulluk","user":"gulluk","type":"user","name":"gulluk"},"summary":"Reinforcement learning with verifiable rewards has rapidly advanced reasoning in vision--language models. However, for chest X-ray report generation, the standard rewards (i.e. exact-match accuracy and step-level processes) are incompatible because the reports consist of unordered and orthogonal findings, rather than a causal reasoning chain. We address this gap with a set-based view: each report is split into sentences and embedded by a frozen sentence transformer, yielding unordered embedding sets. We propose the use of set-to-set distances between generated and reference embeddings as continuous, permutation-invariant rewards. Across two datasets and three vision--language models (Qwen3-VL-2B/4B, Gemma3-4B), post-training with set-to-set distance based rewards via GRPO consistently outperforms supervised fine-tuning and exact-match GRPO on all headline metrics (BERTScore, RadGraph F1 and CheXbert F1 by average \\%6.80, \\%7.82 and \\%4.45 relative improvements respectively). The same set distances also enable test-time best-of-N selection: scoring candidates by their distance to training-report embeddings outperforms random selection on our trained models as well as three closed-source LLMs (Mistral-Small, Gemini-2.5 Flash-Lite, GPT-4o-mini) with on average \\%16.4 relative improvement on BERTScore. Used as a streaming signal, they support a more efficient form of test-time scaling: pruning low-scoring candidates mid-generation reduces generated tokens by over 50\\% while preserving the Findings quality of full best-of-N selection. Together these results establish set-distance rewards as a unified signal for both post-training and test-time scaling in chest X-ray report generation. Our code is publicly https://anonymous.4open.science/r/Set-Distance-Rewards-CXR-BFDA{available}.","upvotes":3,"discussionId":"6a202b6e15100c5272a8425b","projectPage":"https://anonymous.4open.science/r/Set-Distance-Rewards-CXR-BFDA","ai_summary":"Set-based rewards using embedding distances improve chest X-ray report generation by enabling effective post-training and test-time selection without requiring causal reasoning structures.","ai_keywords":["set-to-set distances","sentence transformer","embedding sets","GRPO","BERTScore","RadGraph","CheXbert","best-of-N selection","test-time scaling","streaming signal","pruning"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"6900c65ccd6f5a08e9683db2","name":"StanfordUniversityy","fullname":"Stanford University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6900c5742cc80701f360da45/6RB8XN4KUNvDsQlYhg0gl.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6a2848b45a4ee9efb03b5c0c","avatarUrl":"/avatars/3dc95c7d8c1c7c725ae25fda11cd0b30.svg","isPro":false,"fullname":"Sena Ulutas","user":"senaulutas","type":"user"},{"_id":"666ddefe83571a7a05af7870","avatarUrl":"/avatars/74b88573973b5508e47f5af4044b14a6.svg","isPro":false,"fullname":"Halil Ibrahim Gulluk","user":"gulluk","type":"user"},{"_id":"68210ad4f29d70e1cccc86be","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/peHjJRYBM1-tjTORiyASd.png","isPro":false,"fullname":"mxvp","user":"mxvp","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6900c65ccd6f5a08e9683db2","name":"StanfordUniversityy","fullname":"Stanford University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6900c5742cc80701f360da45/6RB8XN4KUNvDsQlYhg0gl.png"}}">

Papers

arxiv:2606.00440

SDR: Set-Distance Rewards for Radiology Report Generation

Published on May 30

· Submitted by

Halil Ibrahim Gulluk on Jun 9

Stanford University

Upvote

Authors:

Abstract

Set-based rewards using embedding distances improve chest X-ray report generation by enabling effective post-training and test-time selection without requiring causal reasoning structures.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

View arXiv page View PDF Project page Add to collection

Community

gulluk

Paper submitter about 2 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.00440 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.00440 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.00440 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

SDR: Set-Distance Rewards for Radiology Report Generation

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers