Hugging Face Daily Papers · July 1, 2026 · 7 min read

TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

TRIAGE (Role-Typed Credit Assignment for Agentic Reinforcement Learning) is a framework designed to improve credit assignment in agentic RL by adding a semantic \"role\" axis to trajectory-level rewards. It addresses the \"blind spots\" of standard Group Relative Policy Optimization (GRPO), which uniformly rewards or punishes all actions in a trajectory based solely on the final outcome.\n","updatedAt":"2026-07-01T18:36:40.156Z","author":{"_id":"646af200ca17a49700e94aa1","avatarUrl":"/avatars/fdee8313785f592ee11b1c879f3df775.svg","fullname":"Hejian Sang","name":"pb09204048","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":10,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9087899923324585},"editors":["pb09204048"],"editorAvatarUrls":["/avatars/fdee8313785f592ee11b1c879f3df775.svg"],"reactions":[],"isReport":false}},{"id":"6a45c349c59741ace867e9ab","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":372,"isUserFollowing":false},"createdAt":"2026-07-02T01:47:53.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [StepOPSD: Step-Aware Online Preference Distillation for Agent Reinforcement Learning](https://huggingface.co/papers/2605.27140) (2026)\n* [Semantic Consistency Policy Optimization for Reinforcement Learning of LLM Agents](https://huggingface.co/papers/2606.25852) (2026)\n* [Keep Policy Gradient in Charge: Sibling-Guided Credit Distillation for Long-Horizon Tool-Use Agents](https://huggingface.co/papers/2606.12634) (2026)\n* [Self-Induced Outcome Potential: Turn-Level Credit Assignment for Agents without Verifiers](https://huggingface.co/papers/2605.04984) (2026)\n* [What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents](https://huggingface.co/papers/2605.19447) (2026)\n* [SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning](https://huggingface.co/papers/2605.18299) (2026)\n* [TACO: Tool-Augmented Credit Optimization for Agentic Tool Use](https://huggingface.co/papers/2606.30251) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. \nThe following papers were recommended by the Semantic Scholar API \n<ul>\n<li><a href=\"https://huggingface.co/papers/2605.27140\">StepOPSD: Step-Aware Online Preference Distillation for Agent Reinforcement Learning</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.25852\">Semantic Consistency Policy Optimization for Reinforcement Learning of LLM Agents</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.12634\">Keep Policy Gradient in Charge: Sibling-Guided Credit Distillation for Long-Horizon Tool-Use Agents</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.04984\">Self-Induced Outcome Potential: Turn-Level Credit Assignment for Agents without Verifiers</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.19447\">What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.18299\">SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.30251\">TACO: Tool-Augmented Credit Optimization for Agentic Tool Use</a> (2026)</li>\n</ul>\n Please give a thumbs up to this comment if you found it helpful!\n If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code>@librarian-bot recommend</code>\n","updatedAt":"2026-07-02T01:47:53.937Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":372,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7356768846511841},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.32017","authors":[{"_id":"6a4489e141f04ae4d7ad97d7","name":"Yuanda Xu","hidden":false},{"_id":"6a4489e141f04ae4d7ad97d8","name":"Zhengze Zhou","hidden":false},{"_id":"6a4489e141f04ae4d7ad97d9","name":"Hejian Sang","hidden":false},{"_id":"6a4489e141f04ae4d7ad97da","name":"Xiaomin Li","hidden":false},{"_id":"6a4489e141f04ae4d7ad97db","name":"Jiaxin Zhang","hidden":false},{"_id":"6a4489e141f04ae4d7ad97dc","name":"Xinchen Du","hidden":false},{"_id":"6a4489e141f04ae4d7ad97dd","name":"Zhipeng Wang","hidden":false},{"_id":"6a4489e141f04ae4d7ad97de","name":"Alborz Geramifard","hidden":false}],"publishedAt":"2026-06-30T00:00:00.000Z","submittedOnDailyAt":"2026-07-01T00:00:00.000Z","title":"TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning","submittedOnDailyBy":{"_id":"646af200ca17a49700e94aa1","avatarUrl":"/avatars/fdee8313785f592ee11b1c879f3df775.svg","isPro":false,"fullname":"Hejian Sang","user":"pb09204048","type":"user","name":"pb09204048"},"summary":"Agentic reinforcement learning requires assigning credit to environment-facing actions such as searches, clicks, edits, navigation commands, and object interactions. Standard GRPO uses the final verifier outcome as a uniform advantage over all action tokens. This outcome signal is useful but structurally incomplete: it punishes useful exploration in failed rollouts and reinforces redundant or regressive actions in successful rollouts. We propose TRIAGE, a role-typed credit assignment framework that adds a semantic role axis to outcome credit. A structured judge classifies each segment as decisive progress, useful exploration, no-progress infrastructure, or regression, and a fixed role-conditioned rule maps these labels to bounded segment-level process rewards. This keeps verifier outcomes as the source of optimization direction while correcting the two main blind spots of outcome-only credit. We further show that role-conditioned credit is the optimal segment-level correction expressible from role labels alone -- a projection of the per-segment advantage residual onto the role variable -- so that the fixed role constants reduce advantage estimation error whenever the judge is reliable, and we connect this to lower-variance policy gradients. Across ALFWorld, Search-QA, and WebShop, TRIAGE improves success rates over GRPO for two policy models and outperforms both a scalar judge-derived process reward and an outcome-supervised shared-backbone value baseline. Ablations show that the gain comes from role typing rather than merely adding dense rewards: reliable detection of regression inside successful trajectories is the dominant contributor, while exploration credit provides a consistent secondary gain; on completed ALFWorld and WebShop rollouts, TRIAGE also reduces environment-facing turns by an additional 10.4% and 14.8% relative to GRPO.","upvotes":6,"discussionId":"6a4489e141f04ae4d7ad97df","ai_summary":"TRIAGE introduces a role-typed credit assignment framework that enhances agentic reinforcement learning by providing more nuanced credit assignment than standard GRPO methods.","ai_keywords":["agentic reinforcement learning","credit assignment","GRPO","verifier outcome","semantic role axis","structured judge","role-conditioned rule","advantage estimation","policy gradients","ALFWorld","Search-QA","WebShop"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"6697e878d8b5b78e6e7485b7","name":"LinkedIn","fullname":"LinkedIn","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6697e834fd52271e0b9ce8d8/VSBDJkmYgk4-LeXgTKThN.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"646af200ca17a49700e94aa1","avatarUrl":"/avatars/fdee8313785f592ee11b1c879f3df775.svg","isPro":false,"fullname":"Hejian Sang","user":"pb09204048","type":"user"},{"_id":"69a2f97d2927b6e8151f8563","avatarUrl":"/avatars/96bf509fd2e3b90cd339801e8b9b33cb.svg","isPro":false,"fullname":"XYX","user":"xuyd16","type":"user"},{"_id":"69b40b69ec3b730b1dd2b07d","avatarUrl":"/avatars/577357a7de3ac735a12492c2d97c3dcc.svg","isPro":false,"fullname":"Joseph","user":"lkdsmr","type":"user"},{"_id":"641f5d45eefe94aff6e195ae","avatarUrl":"/avatars/a30e41b67276d141736c4043e42c0f21.svg","isPro":false,"fullname":"ab","user":"Shenbixiaoxin","type":"user"},{"_id":"6a4566c59aa1c54fe5e04c3b","avatarUrl":"/avatars/e0716068851c7062e0947a89258c0ef8.svg","isPro":false,"fullname":"Josep","user":"jlzkxjcg2","type":"user"},{"_id":"69e04f07a12f3e3854a0f4f1","avatarUrl":"/avatars/ead875ac27240dba208ae986cd11ef18.svg","isPro":false,"fullname":"DPS","user":"hebaba","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6697e878d8b5b78e6e7485b7","name":"LinkedIn","fullname":"LinkedIn","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6697e834fd52271e0b9ce8d8/VSBDJkmYgk4-LeXgTKThN.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.32017.md","query":{}}">

Papers

arxiv:2606.32017

TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning

Published on Jun 30

· Submitted by

Hejian Sang on Jul 1

Upvote

Authors:

Abstract

TRIAGE introduces a role-typed credit assignment framework that enhances agentic reinforcement learning by providing more nuanced credit assignment than standard GRPO methods.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Agentic reinforcement learning requires assigning credit to environment-facing actions such as searches, clicks, edits, navigation commands, and object interactions. Standard GRPO uses the final verifier outcome as a uniform advantage over all action tokens. This outcome signal is useful but structurally incomplete: it punishes useful exploration in failed rollouts and reinforces redundant or regressive actions in successful rollouts. We propose TRIAGE, a role-typed credit assignment framework that adds a semantic role axis to outcome credit. A structured judge classifies each segment as decisive progress, useful exploration, no-progress infrastructure, or regression, and a fixed role-conditioned rule maps these labels to bounded segment-level process rewards. This keeps verifier outcomes as the source of optimization direction while correcting the two main blind spots of outcome-only credit. We further show that role-conditioned credit is the optimal segment-level correction expressible from role labels alone -- a projection of the per-segment advantage residual onto the role variable -- so that the fixed role constants reduce advantage estimation error whenever the judge is reliable, and we connect this to lower-variance policy gradients. Across ALFWorld, Search-QA, and WebShop, TRIAGE improves success rates over GRPO for two policy models and outperforms both a scalar judge-derived process reward and an outcome-supervised shared-backbone value baseline. Ablations show that the gain comes from role typing rather than merely adding dense rewards: reliable detection of regression inside successful trajectories is the dominant contributor, while exploration credit provides a consistent secondary gain; on completed ALFWorld and WebShop rollouts, TRIAGE also reduces environment-facing turns by an additional 10.4% and 14.8% relative to GRPO.

View arXiv page View PDF Add to collection

Community

pb09204048

Paper submitter about 7 hours ago

TRIAGE (Role-Typed Credit Assignment for Agentic Reinforcement Learning) is a framework designed to improve credit assignment in agentic RL by adding a semantic "role" axis to trajectory-level rewards. It addresses the "blind spots" of standard Group Relative Policy Optimization (GRPO), which uniformly rewards or punishes all actions in a trajectory based solely on the final outcome.

librarian-bot

13 minutes ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.32017

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.32017 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.32017 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.32017 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers