TRIAGE (Role-Typed Credit Assignment for Agentic Reinforcement Learning) is a framework designed to improve credit assignment in agentic RL by adding a semantic \"role\" axis to trajectory-level rewards. It addresses the \"blind spots\" of standard Group Relative Policy Optimization (GRPO), which uniformly rewards or punishes all actions in a trajectory based solely on the final outcome.</p>\n","updatedAt":"2026-07-01T18:36:40.156Z","author":{"_id":"646af200ca17a49700e94aa1","avatarUrl":"/avatars/fdee8313785f592ee11b1c879f3df775.svg","fullname":"Hejian Sang","name":"pb09204048","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":10,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9087899923324585},"editors":["pb09204048"],"editorAvatarUrls":["/avatars/fdee8313785f592ee11b1c879f3df775.svg"],"reactions":[],"isReport":false}},{"id":"6a45c349c59741ace867e9ab","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":372,"isUserFollowing":false},"createdAt":"2026-07-02T01:47:53.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [StepOPSD: Step-Aware Online Preference Distillation for Agent Reinforcement Learning](https://huggingface.co/papers/2605.27140) (2026)\n* [Semantic Consistency Policy Optimization for Reinforcement Learning of LLM Agents](https://huggingface.co/papers/2606.25852) (2026)\n* [Keep Policy Gradient in Charge: Sibling-Guided Credit Distillation for Long-Horizon Tool-Use Agents](https://huggingface.co/papers/2606.12634) (2026)\n* [Self-Induced Outcome Potential: Turn-Level Credit Assignment for Agents without Verifiers](https://huggingface.co/papers/2605.04984) (2026)\n* [What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents](https://huggingface.co/papers/2605.19447) (2026)\n* [SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning](https://huggingface.co/papers/2605.18299) (2026)\n* [TACO: Tool-Augmented Credit Optimization for Agentic Tool Use](https://huggingface.co/papers/2606.30251) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2605.27140\">StepOPSD: Step-Aware Online Preference Distillation for Agent Reinforcement Learning</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.25852\">Semantic Consistency Policy Optimization for Reinforcement Learning of LLM Agents</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.12634\">Keep Policy Gradient in Charge: Sibling-Guided Credit Distillation for Long-Horizon Tool-Use Agents</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.04984\">Self-Induced Outcome Potential: Turn-Level Credit Assignment for Agents without Verifiers</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.19447\">What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.18299\">SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.30251\">TACO: Tool-Augmented Credit Optimization for Agentic Tool Use</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code>@librarian-bot recommend</code></p>\n","updatedAt":"2026-07-02T01:47:53.937Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":372,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7356768846511841},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.32017","authors":[{"_id":"6a4489e141f04ae4d7ad97d7","name":"Yuanda Xu","hidden":false},{"_id":"6a4489e141f04ae4d7ad97d8","name":"Zhengze Zhou","hidden":false},{"_id":"6a4489e141f04ae4d7ad97d9","name":"Hejian Sang","hidden":false},{"_id":"6a4489e141f04ae4d7ad97da","name":"Xiaomin Li","hidden":false},{"_id":"6a4489e141f04ae4d7ad97db","name":"Jiaxin Zhang","hidden":false},{"_id":"6a4489e141f04ae4d7ad97dc","name":"Xinchen Du","hidden":false},{"_id":"6a4489e141f04ae4d7ad97dd","name":"Zhipeng Wang","hidden":false},{"_id":"6a4489e141f04ae4d7ad97de","name":"Alborz Geramifard","hidden":false}],"publishedAt":"2026-06-30T00:00:00.000Z","submittedOnDailyAt":"2026-07-01T00:00:00.000Z","title":"TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning","submittedOnDailyBy":{"_id":"646af200ca17a49700e94aa1","avatarUrl":"/avatars/fdee8313785f592ee11b1c879f3df775.svg","isPro":false,"fullname":"Hejian Sang","user":"pb09204048","type":"user","name":"pb09204048"},"summary":"Agentic reinforcement learning requires assigning credit to environment-facing actions such as searches, clicks, edits, navigation commands, and object interactions. Standard GRPO uses the final verifier outcome as a uniform advantage over all action tokens. This outcome signal is useful but structurally incomplete: it punishes useful exploration in failed rollouts and reinforces redundant or regressive actions in successful rollouts. We propose TRIAGE, a role-typed credit assignment framework that adds a semantic role axis to outcome credit. A structured judge classifies each segment as decisive progress, useful exploration, no-progress infrastructure, or regression, and a fixed role-conditioned rule maps these labels to bounded segment-level process rewards. This keeps verifier outcomes as the source of optimization direction while correcting the two main blind spots of outcome-only credit. We further show that role-conditioned credit is the optimal segment-level correction expressible from role labels alone -- a projection of the per-segment advantage residual onto the role variable -- so that the fixed role constants reduce advantage estimation error whenever the judge is reliable, and we connect this to lower-variance policy gradients. Across ALFWorld, Search-QA, and WebShop, TRIAGE improves success rates over GRPO for two policy models and outperforms both a scalar judge-derived process reward and an outcome-supervised shared-backbone value baseline. Ablations show that the gain comes from role typing rather than merely adding dense rewards: reliable detection of regression inside successful trajectories is the dominant contributor, while exploration credit provides a consistent secondary gain; on completed ALFWorld and WebShop rollouts, TRIAGE also reduces environment-facing turns by an additional 10.4% and 14.8% relative to GRPO.","upvotes":6,"discussionId":"6a4489e141f04ae4d7ad97df","ai_summary":"TRIAGE introduces a role-typed credit assignment framework that enhances agentic reinforcement learning by providing more nuanced credit assignment than standard GRPO methods.","ai_keywords":["agentic reinforcement learning","credit assignment","GRPO","verifier outcome","semantic role axis","structured judge","role-conditioned rule","advantage estimation","policy gradients","ALFWorld","Search-QA","WebShop"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"6697e878d8b5b78e6e7485b7","name":"LinkedIn","fullname":"LinkedIn","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6697e834fd52271e0b9ce8d8/VSBDJkmYgk4-LeXgTKThN.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"646af200ca17a49700e94aa1","avatarUrl":"/avatars/fdee8313785f592ee11b1c879f3df775.svg","isPro":false,"fullname":"Hejian Sang","user":"pb09204048","type":"user"},{"_id":"69a2f97d2927b6e8151f8563","avatarUrl":"/avatars/96bf509fd2e3b90cd339801e8b9b33cb.svg","isPro":false,"fullname":"XYX","user":"xuyd16","type":"user"},{"_id":"69b40b69ec3b730b1dd2b07d","avatarUrl":"/avatars/577357a7de3ac735a12492c2d97c3dcc.svg","isPro":false,"fullname":"Joseph","user":"lkdsmr","type":"user"},{"_id":"641f5d45eefe94aff6e195ae","avatarUrl":"/avatars/a30e41b67276d141736c4043e42c0f21.svg","isPro":false,"fullname":"ab","user":"Shenbixiaoxin","type":"user"},{"_id":"6a4566c59aa1c54fe5e04c3b","avatarUrl":"/avatars/e0716068851c7062e0947a89258c0ef8.svg","isPro":false,"fullname":"Josep","user":"jlzkxjcg2","type":"user"},{"_id":"69e04f07a12f3e3854a0f4f1","avatarUrl":"/avatars/ead875ac27240dba208ae986cd11ef18.svg","isPro":false,"fullname":"DPS","user":"hebaba","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6697e878d8b5b78e6e7485b7","name":"LinkedIn","fullname":"LinkedIn","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6697e834fd52271e0b9ce8d8/VSBDJkmYgk4-LeXgTKThN.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.32017.md","query":{}}">
TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning
Abstract
TRIAGE introduces a role-typed credit assignment framework that enhances agentic reinforcement learning by providing more nuanced credit assignment than standard GRPO methods.
Agentic reinforcement learning requires assigning credit to environment-facing actions such as searches, clicks, edits, navigation commands, and object interactions. Standard GRPO uses the final verifier outcome as a uniform advantage over all action tokens. This outcome signal is useful but structurally incomplete: it punishes useful exploration in failed rollouts and reinforces redundant or regressive actions in successful rollouts. We propose TRIAGE, a role-typed credit assignment framework that adds a semantic role axis to outcome credit. A structured judge classifies each segment as decisive progress, useful exploration, no-progress infrastructure, or regression, and a fixed role-conditioned rule maps these labels to bounded segment-level process rewards. This keeps verifier outcomes as the source of optimization direction while correcting the two main blind spots of outcome-only credit. We further show that role-conditioned credit is the optimal segment-level correction expressible from role labels alone -- a projection of the per-segment advantage residual onto the role variable -- so that the fixed role constants reduce advantage estimation error whenever the judge is reliable, and we connect this to lower-variance policy gradients. Across ALFWorld, Search-QA, and WebShop, TRIAGE improves success rates over GRPO for two policy models and outperforms both a scalar judge-derived process reward and an outcome-supervised shared-backbone value baseline. Ablations show that the gain comes from role typing rather than merely adding dense rewards: reliable detection of regression inside successful trajectories is the dominant contributor, while exploration credit provides a consistent secondary gain; on completed ALFWorld and WebShop rollouts, TRIAGE also reduces environment-facing turns by an additional 10.4% and 14.8% relative to GRPO.
Community
TRIAGE (Role-Typed Credit Assignment for Agentic Reinforcement Learning) is a framework designed to improve credit assignment in agentic RL by adding a semantic "role" axis to trajectory-level rewards. It addresses the "blind spots" of standard Group Relative Policy Optimization (GRPO), which uniformly rewards or punishes all actions in a trajectory based solely on the final outcome.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.32017 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.32017 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.32017 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.