Hugging Face Daily Papers · June 30, 2026 · 12 min read

Agentic Abstention: Do Agents Know When to Stop Instead of Act?

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

We introduce Agentic Abstention: the problem of deciding when an LLM agent should stop acting under uncertainty.\nMany agent benchmarks focus on successful task completion, but real-world tasks are often ambiguous, underspecified, or infeasible in the available environment. In such cases, a reliable agent should know when further tool use is unlikely to help and abstain instead of continuing unnecessary actions.\nWe evaluate 13 LLM-as-agent systems and 2 agent scaffolds on more than 28,000 tasks across web shopping, terminal environments, and question answering. Our findings show that the main challenge is not only whether agents can abstain, but when they abstain.\nWe also propose CONVOLVE, a context engineering method that improves timely abstention by converting interaction trajectories into reusable stopping rules.\nHappy to hear thoughts from the community!\n","updatedAt":"2026-06-30T09:22:41.825Z","author":{"_id":"68a0606dbbcbb7c347256908","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68a0606dbbcbb7c347256908/JuZGxOT1EHi1dn2_qf2Xz.jpeg","fullname":"Han Luo","name":"sxcn","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9223545789718628},"editors":["sxcn"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/68a0606dbbcbb7c347256908/JuZGxOT1EHi1dn2_qf2Xz.jpeg"],"reactions":[],"isReport":false}},{"id":"6a43afd4c4032ef9607af692","author":{"_id":"658412f93a84a40185adaf37","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/658412f93a84a40185adaf37/FKXH7e1jj09KO1v-B5sER.jpeg","fullname":"Aamer Mihaysi","name":"O96a","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-06-30T12:00:20.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"The concept of \"Agentic Abstention\" is a critical missing piece for anyone actually deploying agents in production. Most current evaluations focus on the \"happy path\" where the goal is achievable, but in the real world, the most expensive failures happen when an agent loops indefinitely on an impossible task. Shifting the evaluation from a single-turn \"I don't know\" to a sequential decision process is the right engineering move. If we can't quantify the confidence threshold for when to stop, we're just gambling with our token budget. I'm interested to see if this framework can be integrated into a dynamic cost-benefit analysis for tool calls.","html":"The concept of \"Agentic Abstention\" is a critical missing piece for anyone actually deploying agents in production. Most current evaluations focus on the \"happy path\" where the goal is achievable, but in the real world, the most expensive failures happen when an agent loops indefinitely on an impossible task. Shifting the evaluation from a single-turn \"I don't know\" to a sequential decision process is the right engineering move. If we can't quantify the confidence threshold for when to stop, we're just gambling with our token budget. I'm interested to see if this framework can be integrated into a dynamic cost-benefit analysis for tool calls.\n","updatedAt":"2026-06-30T12:00:20.179Z","author":{"_id":"658412f93a84a40185adaf37","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/658412f93a84a40185adaf37/FKXH7e1jj09KO1v-B5sER.jpeg","fullname":"Aamer Mihaysi","name":"O96a","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9198399186134338},"editors":["O96a"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/658412f93a84a40185adaf37/FKXH7e1jj09KO1v-B5sER.jpeg"],"reactions":[{"reaction":"🤗","users":["sxcn"],"count":1}],"isReport":false},"replies":[{"id":"6a43baff4731b92c523674fb","author":{"_id":"68a0606dbbcbb7c347256908","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68a0606dbbcbb7c347256908/JuZGxOT1EHi1dn2_qf2Xz.jpeg","fullname":"Han Luo","name":"sxcn","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false},"createdAt":"2026-06-30T12:47:59.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Thanks a lot! I really agree with this perspective. In production settings, abstention is not just “I don’t know,” but a sequential decision about whether another action is still worth its cost under uncertainty.\n\nThe dynamic cost-benefit angle is a very natural next step. In our current work, we focus on whether and when agents should stop, but this could be extended by explicitly modeling the expected utility of the next tool call, including information gain, recovery probability, latency, token/tool cost, and the risk of compounding errors.\n\nI think this would make agentic abstention much more deployment-oriented, and it’s definitely a direction we’re excited to explore 😃 ","html":"Thanks a lot! I really agree with this perspective. In production settings, abstention is not just “I don’t know,” but a sequential decision about whether another action is still worth its cost under uncertainty.\nThe dynamic cost-benefit angle is a very natural next step. In our current work, we focus on whether and when agents should stop, but this could be extended by explicitly modeling the expected utility of the next tool call, including information gain, recovery probability, latency, token/tool cost, and the risk of compounding errors.\nI think this would make agentic abstention much more deployment-oriented, and it’s definitely a direction we’re excited to explore 😃 \n","updatedAt":"2026-06-30T12:47:59.897Z","author":{"_id":"68a0606dbbcbb7c347256908","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68a0606dbbcbb7c347256908/JuZGxOT1EHi1dn2_qf2Xz.jpeg","fullname":"Han Luo","name":"sxcn","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9470726847648621},"editors":["sxcn"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/68a0606dbbcbb7c347256908/JuZGxOT1EHi1dn2_qf2Xz.jpeg"],"reactions":[{"reaction":"👍","users":["sxcn"],"count":1}],"isReport":false,"parentCommentId":"6a43afd4c4032ef9607af692"}}]},{"id":"6a43dc86cb159b889c9962b7","author":{"_id":"692c365543bf9926cb975f77","avatarUrl":"/avatars/d6550ec8e0ee16a095e516d7144efb85.svg","fullname":"Yiwei-Sheng","name":"Yiwei-Sheng","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-06-30T15:11:02.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Great paper! I really like how it points out that agent reliability is not only about completing tasks, but also about recognizing when the task may be impossible or underspecified in the first place.\nThe distinction between abstaining correctly and abstaining at the right time is especially useful. Many agents may eventually stop, but only after taking many unnecessary steps, which existing evaluations often fail to capture. Framing abstention as a sequential decision problem makes this behavior much easier to study and compare across systems.\nI’m excited to see more benchmarks evaluate this kind of stopping behavior in tool-using agents.","html":"Great paper! I really like how it points out that agent reliability is not only about completing tasks, but also about recognizing when the task may be impossible or underspecified in the first place. The distinction between abstaining correctly and abstaining at the right time is especially useful. Many agents may eventually stop, but only after taking many unnecessary steps, which existing evaluations often fail to capture. Framing abstention as a sequential decision problem makes this behavior much easier to study and compare across systems. I’m excited to see more benchmarks evaluate this kind of stopping behavior in tool-using agents.\n","updatedAt":"2026-06-30T15:11:02.733Z","author":{"_id":"692c365543bf9926cb975f77","avatarUrl":"/avatars/d6550ec8e0ee16a095e516d7144efb85.svg","fullname":"Yiwei-Sheng","name":"Yiwei-Sheng","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.952782392501831},"editors":["Yiwei-Sheng"],"editorAvatarUrls":["/avatars/d6550ec8e0ee16a095e516d7144efb85.svg"],"reactions":[],"isReport":false},"replies":[{"id":"6a43f8b2e89c5af0eac910cc","author":{"_id":"68a0606dbbcbb7c347256908","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68a0606dbbcbb7c347256908/JuZGxOT1EHi1dn2_qf2Xz.jpeg","fullname":"Han Luo","name":"sxcn","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false},"createdAt":"2026-06-30T17:11:14.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Thanks!","html":"Thanks!\n","updatedAt":"2026-06-30T17:11:14.780Z","author":{"_id":"68a0606dbbcbb7c347256908","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68a0606dbbcbb7c347256908/JuZGxOT1EHi1dn2_qf2Xz.jpeg","fullname":"Han Luo","name":"sxcn","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7788880467414856},"editors":["sxcn"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/68a0606dbbcbb7c347256908/JuZGxOT1EHi1dn2_qf2Xz.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"6a43dc86cb159b889c9962b7"}}]},{"id":"6a43dd08544080d1c354403a","author":{"_id":"68a0606dbbcbb7c347256908","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68a0606dbbcbb7c347256908/JuZGxOT1EHi1dn2_qf2Xz.jpeg","fullname":"Han Luo","name":"sxcn","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false},"createdAt":"2026-06-30T15:13:12.000Z","type":"comment","data":{"edited":true,"hidden":true,"hiddenBy":"","hiddenReason":"Resolved","latest":{"raw":"This comment has been hidden","html":"This comment has been hidden","updatedAt":"2026-06-30T17:11:34.551Z","author":{"_id":"68a0606dbbcbb7c347256908","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68a0606dbbcbb7c347256908/JuZGxOT1EHi1dn2_qf2Xz.jpeg","fullname":"Han Luo","name":"sxcn","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"editors":[],"editorAvatarUrls":[],"reactions":[]}},{"id":"6a440cf0a85a14fefafee657","author":{"_id":"636c33221df179e8b83dbed1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/636c33221df179e8b83dbed1/IHPJ-UICTP66ZB6Y1UfQU.png","fullname":"Pico","name":"EnderPico","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-06-30T18:37:36.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"I've noticed this behavior in agents such as Hermes or OpenClaw but I've never could point out what made them so fixated in doing things even when not possible or even a way to minimize it, loved this paper and the proposed fix.","html":"I've noticed this behavior in agents such as Hermes or OpenClaw but I've never could point out what made them so fixated in doing things even when not possible or even a way to minimize it, loved this paper and the proposed fix.\n","updatedAt":"2026-06-30T18:38:48.057Z","author":{"_id":"636c33221df179e8b83dbed1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/636c33221df179e8b83dbed1/IHPJ-UICTP66ZB6Y1UfQU.png","fullname":"Pico","name":"EnderPico","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.9775724411010742},"editors":["EnderPico"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/636c33221df179e8b83dbed1/IHPJ-UICTP66ZB6Y1UfQU.png"],"reactions":[{"reaction":"❤️","users":["sxcn"],"count":1}],"isReport":false},"replies":[{"id":"6a440efe52f3fb621f6e2125","author":{"_id":"68a0606dbbcbb7c347256908","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68a0606dbbcbb7c347256908/JuZGxOT1EHi1dn2_qf2Xz.jpeg","fullname":"Han Luo","name":"sxcn","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false},"createdAt":"2026-06-30T18:46:22.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Thanks a lot! This is exactly the behavior we wanted to study. Many agents look strong on achievable tasks, but when the task is impossible or the environment lacks the needed information, they can keep taking actions as if one more step will fix it.\n\nWe hope Agentic Abstention helps make this failure mode more explicit, measurable, and easier to mitigate. Really glad you found the paper useful!","html":"Thanks a lot! This is exactly the behavior we wanted to study. Many agents look strong on achievable tasks, but when the task is impossible or the environment lacks the needed information, they can keep taking actions as if one more step will fix it.\nWe hope Agentic Abstention helps make this failure mode more explicit, measurable, and easier to mitigate. Really glad you found the paper useful!\n","updatedAt":"2026-06-30T18:46:22.665Z","author":{"_id":"68a0606dbbcbb7c347256908","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68a0606dbbcbb7c347256908/JuZGxOT1EHi1dn2_qf2Xz.jpeg","fullname":"Han Luo","name":"sxcn","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9353921413421631},"editors":["sxcn"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/68a0606dbbcbb7c347256908/JuZGxOT1EHi1dn2_qf2Xz.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"6a440cf0a85a14fefafee657"}}]}],"primaryEmailConfirmed":false,"paper":{"id":"2606.28733","authors":[{"_id":"6a43896919e14c76c1bfad0b","name":"Han Luo","hidden":false},{"_id":"6a43896919e14c76c1bfad0c","name":"Bingbing Wen","hidden":false},{"_id":"6a43896919e14c76c1bfad0d","name":"Lucy Lu Wang","hidden":false}],"publishedAt":"2026-06-27T00:00:00.000Z","submittedOnDailyAt":"2026-06-30T00:00:00.000Z","title":"Agentic Abstention: Do Agents Know When to Stop Instead of Act?","submittedOnDailyBy":{"_id":"68a0606dbbcbb7c347256908","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68a0606dbbcbb7c347256908/JuZGxOT1EHi1dn2_qf2Xz.jpeg","isPro":false,"fullname":"Han Luo","user":"sxcn","type":"user","name":"sxcn"},"summary":"LLM agents are expected to act over multiple turns, using search, browsing interfaces, and terminal tools to complete user goals. Yet not every goal is well specified or achievable in the available environment. In such cases, a reliable agent should recognize that further interaction is unlikely to help and abstain from additional tool calls. We define Agentic Abstention, the problem of deciding when an agent should stop acting under uncertainty. Unlike standard LLM abstention, which is usually evaluated as a single-turn answer-or-abstain decision, agentic abstention is a sequential decision problem: an agent can answer, abstain, or gather more information at each turn, and the need to abstain may only become clear after interacting with the environment. We study this problem across web shopping, terminal environments, and question answering, evaluating 13 LLM-as-agent systems and 2 agent scaffolds on more than 28,000 tasks. Our results show that the main challenge is not only whether agents can abstain, but also when they abstain. Some agents never abstain when they should, while others do so only after many unnecessary interactions. This gap is especially large on tasks where the instruction appears feasible until the environment reveals otherwise (e.g., no valid result matches the instruction). We further find that model scale, reasoning, and agent scaffolding affect abstention in different ways, where larger or more capable models sometimes perform worse at timely abstention. Finally, we introduce CONVOLVE, a context engineering method for improving agentic abstention that distills full interaction trajectories into reusable stopping rules. On WebShop, CONVOLVE substantially improves timely abstention without updating model parameters, raising Llama-3.3-70B's timely recall rate from 26.7 to 57.4. Our dataset and code are available at https://lhannnn.github.io/agentic-abstention","upvotes":120,"discussionId":"6a43896919e14c76c1bfad0e","projectPage":"https://lhannnn.github.io/agentic-abstention","ai_summary":"Agentic abstention involves determining when an AI agent should cease interaction under uncertainty, requiring sequential decision-making across multiple environments and task types.","ai_keywords":["agentic abstention","LLM-as-agent systems","sequential decision problem","web shopping","terminal environments","question answering","context engineering","CONVOLVE","stopping rules"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"6315a1bb86b3db2ac420100e","name":"UW","fullname":"University of Washington","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/61ac8f8a00d01045fca0ad2f/gr5B_WVvbMr4kTox5UkwZ.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"68a0606dbbcbb7c347256908","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68a0606dbbcbb7c347256908/JuZGxOT1EHi1dn2_qf2Xz.jpeg","isPro":false,"fullname":"Han Luo","user":"sxcn","type":"user"},{"_id":"6a43981af662ca6be1fa9d5a","avatarUrl":"/avatars/c227a1c70372ae74a73f20db13f7303b.svg","isPro":false,"fullname":"mARtiNez-MiCHaEl","user":"its-Gold-Patel49029","type":"user"},{"_id":"6a43988a074c7eaa4fd83d5d","avatarUrl":"/avatars/ef8eac7d6a2bc90cfd7282bb3004450f.svg","isPro":false,"fullname":"PHIlLiPs-cUrtIS","user":"MA2ia-N0N3-JoNE594875","type":"user"},{"_id":"6a4398e03515ec9fe5e142c9","avatarUrl":"/avatars/a52b093235ee91d38b0da306a5a19521.svg","isPro":false,"fullname":"Ruben-Daniels-98","user":"mr-Back-Image16","type":"user"},{"_id":"6a439949d8a6369b14923116","avatarUrl":"/avatars/dcaf58b64c13c293fd1a4bed1d61d8f8.svg","isPro":false,"fullname":"David-Kim","user":"4NGELA-ESPEKIALLY-FREEM4N-base","type":"user"},{"_id":"6a4399c8074c7eaa4fd84c2f","avatarUrl":"/avatars/c17d685cdf408bfb952f18baf99a1f17.svg","isPro":false,"fullname":"KriSTEn-STevenSon","user":"Leblanc-Cervantes73","type":"user"},{"_id":"6a439a32ddeb700bc9aef5a3","avatarUrl":"/avatars/2ab046b7b94f8e0a4289185024d7ffed.svg","isPro":false,"fullname":"Rivera-Evan","user":"sir-L1gHtSKy61ve-J4ckson2596","type":"user"},{"_id":"6a439a8366e7a07ad078055f","avatarUrl":"/avatars/ee13bfebb111e10462a6626905102a61.svg","isPro":false,"fullname":"Robertson-Sean","user":"ST3PHAN1E-1992","type":"user"},{"_id":"6a439ad3ddeb700bc9aefcd0","avatarUrl":"/avatars/b42461820f2f672735854dbedb1e0308.svg","isPro":false,"fullname":"HANNAH-WENDY-LUNA","user":"raw-D4Rkorchid-Shelton-12376","type":"user"},{"_id":"6a439b2d24eea979504decf0","avatarUrl":"/avatars/7aa5171b6031050b639d0d17d39e93cd.svg","isPro":false,"fullname":"M-ALLEN","user":"KAREN-1987","type":"user"},{"_id":"6a439b869daac3f820bac9b9","avatarUrl":"/avatars/4c921e3ae95db6de76efbc513dd70429.svg","isPro":false,"fullname":"Brittany-Deborah-Duncan","user":"Ashley-1985","type":"user"},{"_id":"697c8b15a7f796854ef333c4","avatarUrl":"/avatars/94de3a736fac914944f1b57609e3819a.svg","isPro":false,"fullname":"Joel Wang","user":"joelhenwang","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":1,"organization":{"_id":"6315a1bb86b3db2ac420100e","name":"UW","fullname":"University of Washington","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/61ac8f8a00d01045fca0ad2f/gr5B_WVvbMr4kTox5UkwZ.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.28733.md","query":{}}">

Papers

arxiv:2606.28733

Agentic Abstention: Do Agents Know When to Stop Instead of Act?

Published on Jun 27

· Submitted by

Han Luo on Jun 30

#1 Paper of the day

University of Washington

Upvote

120

Authors:

Abstract

Agentic abstention involves determining when an AI agent should cease interaction under uncertainty, requiring sequential decision-making across multiple environments and task types.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

LLM agents are expected to act over multiple turns, using search, browsing interfaces, and terminal tools to complete user goals. Yet not every goal is well specified or achievable in the available environment. In such cases, a reliable agent should recognize that further interaction is unlikely to help and abstain from additional tool calls. We define Agentic Abstention, the problem of deciding when an agent should stop acting under uncertainty. Unlike standard LLM abstention, which is usually evaluated as a single-turn answer-or-abstain decision, agentic abstention is a sequential decision problem: an agent can answer, abstain, or gather more information at each turn, and the need to abstain may only become clear after interacting with the environment. We study this problem across web shopping, terminal environments, and question answering, evaluating 13 LLM-as-agent systems and 2 agent scaffolds on more than 28,000 tasks. Our results show that the main challenge is not only whether agents can abstain, but also when they abstain. Some agents never abstain when they should, while others do so only after many unnecessary interactions. This gap is especially large on tasks where the instruction appears feasible until the environment reveals otherwise (e.g., no valid result matches the instruction). We further find that model scale, reasoning, and agent scaffolding affect abstention in different ways, where larger or more capable models sometimes perform worse at timely abstention. Finally, we introduce CONVOLVE, a context engineering method for improving agentic abstention that distills full interaction trajectories into reusable stopping rules. On WebShop, CONVOLVE substantially improves timely abstention without updating model parameters, raising Llama-3.3-70B's timely recall rate from 26.7 to 57.4. Our dataset and code are available at https://lhannnn.github.io/agentic-abstention

View arXiv page View PDF Project page Add to collection

Community

sxcn

Paper submitter about 16 hours ago

We introduce Agentic Abstention: the problem of deciding when an LLM agent should stop acting under uncertainty.

Many agent benchmarks focus on successful task completion, but real-world tasks are often ambiguous, underspecified, or infeasible in the available environment. In such cases, a reliable agent should know when further tool use is unlikely to help and abstain instead of continuing unnecessary actions.

We evaluate 13 LLM-as-agent systems and 2 agent scaffolds on more than 28,000 tasks across web shopping, terminal environments, and question answering. Our findings show that the main challenge is not only whether agents can abstain, but when they abstain.

We also propose CONVOLVE, a context engineering method that improves timely abstention by converting interaction trajectories into reusable stopping rules.

Happy to hear thoughts from the community!

O96a

about 13 hours ago

The concept of "Agentic Abstention" is a critical missing piece for anyone actually deploying agents in production. Most current evaluations focus on the "happy path" where the goal is achievable, but in the real world, the most expensive failures happen when an agent loops indefinitely on an impossible task. Shifting the evaluation from a single-turn "I don't know" to a sequential decision process is the right engineering move. If we can't quantify the confidence threshold for when to stop, we're just gambling with our token budget. I'm interested to see if this framework can be integrated into a dynamic cost-benefit analysis for tool calls.

sxcn

about 12 hours ago

Thanks a lot! I really agree with this perspective. In production settings, abstention is not just “I don’t know,” but a sequential decision about whether another action is still worth its cost under uncertainty.

The dynamic cost-benefit angle is a very natural next step. In our current work, we focus on whether and when agents should stop, but this could be extended by explicitly modeling the expected utility of the next tool call, including information gain, recovery probability, latency, token/tool cost, and the risk of compounding errors.

I think this would make agentic abstention much more deployment-oriented, and it’s definitely a direction we’re excited to explore 😃

Yiwei-Sheng

about 10 hours ago

Great paper! I really like how it points out that agent reliability is not only about completing tasks, but also about recognizing when the task may be impossible or underspecified in the first place.
The distinction between abstaining correctly and abstaining at the right time is especially useful. Many agents may eventually stop, but only after taking many unnecessary steps, which existing evaluations often fail to capture. Framing abstention as a sequential decision problem makes this behavior much easier to study and compare across systems.
I’m excited to see more benchmarks evaluate this kind of stopping behavior in tool-using agents.

sxcn

about 8 hours ago

Thanks!

sxcn

Paper submitter about 10 hours ago

This comment has been hidden (marked as Resolved)

EnderPico

about 6 hours ago

•

edited about 6 hours ago

I've noticed this behavior in agents such as Hermes or OpenClaw but I've never could point out what made them so fixated in doing things even when not possible or even a way to minimize it, loved this paper and the proposed fix.

sxcn

about 6 hours ago

Thanks a lot! This is exactly the behavior we wanted to study. Many agents look strong on achievable tasks, but when the task is impossible or the environment lacks the needed information, they can keep taking actions as if one more step will fix it.

We hope Agentic Abstention helps make this failure mode more explicit, measurable, and easier to mitigate. Really glad you found the paper useful!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

120

Get this paper in your agent:

hf papers read 2606.28733

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.28733 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.28733 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.28733 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Agentic Abstention: Do Agents Know When to Stop Instead of Act?

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers