Hugging Face Daily Papers · July 1, 2026 · 9 min read

Hierarchical Experimentalist Agents

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Large language models (LLMs) are increasingly used to take actions in the real world and support human decision-making, yet most agents rely on parametric knowledge, fixed post-training data, retrieval, or search. This paradigm breaks down in novel domains and for sophisticated queries that cannot be answered from prior knowledge alone. Knowing the laws of physics, for instance, does not by itself enable LLMs to answer queries or complete long-horizon tasks in a complex physical system. To address this, we introduce Hierarchical Experimentalist Agents (HExA), an in-context self-improvement framework to learn from active experimentation. HExA iteratively designs and refines query-relevant experiments, learns a reusable library of composable skills from experience, and integrates experimental evidence to answer queries or take actions. HExA is training-free, compatible with any black-box model, and does not require external supervision, oracles, or offline data. To evaluate active experimentation, we introduce Interphyre, a tool-calling benchmark built on the PHYRE 2D procedural physics environment, where agents propose interventions and test hypotheses through simulation APIs. Experiments show that current LLM agents struggle in these settings, especially on the hardest levels of Interphyre. Claude Sonnet 4.6 achieves only 2% success, while HExA improves the same model to up to 77% success. HExA also improves open-weight models and outperforms agentic baselines such as ReAct and Reflexion. Moreover, using only skills learned from easier levels and transferred without active experimentation, HExA achieves 44% success, demonstrating the reusability and generalization of its learned skills. Overall, HExA shows that learning through active experimentation can help agents discover useful knowledge, acquire reusable skills, and make efficient progress on novel long-horizon tasks.\n","updatedAt":"2026-07-01T16:56:40.206Z","author":{"_id":"6154ee656181394cc00cb990","avatarUrl":"/avatars/25d3b6911c5991b1869f5d76ca2c4069.svg","fullname":"Abhranil Chandra","name":"abhranil14","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8982005715370178},"editors":["abhranil14"],"editorAvatarUrls":["/avatars/25d3b6911c5991b1869f5d76ca2c4069.svg"],"reactions":[],"isReport":false}},{"id":"6a45c30b0986c08f4b911c58","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":372,"isUserFollowing":false},"createdAt":"2026-07-02T01:46:51.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Harnessing LLM Agents with Skill Programs](https://huggingface.co/papers/2605.17734) (2026)\n* [Learning from Failure: Inference-Time Self-Improvement for Computer-Use Agents](https://huggingface.co/papers/2606.31270) (2026)\n* [OpenSkill: Open-World Self-Evolution for LLM Agents](https://huggingface.co/papers/2606.06741) (2026)\n* [Training Language Agents to Learn from Experience](https://huggingface.co/papers/2605.20477) (2026)\n* [OpenClaw-Skill: Collective Skill Tree Search for Agentic Large Language Models](https://huggingface.co/papers/2606.16774) (2026)\n* [Test-Time Learning with an Evolving Library](https://huggingface.co/papers/2605.14477) (2026)\n* [Robot Self-Improvement via Human-Video Dynamics Models](https://huggingface.co/papers/2606.21406) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. \nThe following papers were recommended by the Semantic Scholar API \n<ul>\n<li><a href=\"https://huggingface.co/papers/2605.17734\">Harnessing LLM Agents with Skill Programs</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.31270\">Learning from Failure: Inference-Time Self-Improvement for Computer-Use Agents</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.06741\">OpenSkill: Open-World Self-Evolution for LLM Agents</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.20477\">Training Language Agents to Learn from Experience</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.16774\">OpenClaw-Skill: Collective Skill Tree Search for Agentic Large Language Models</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.14477\">Test-Time Learning with an Evolving Library</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.21406\">Robot Self-Improvement via Human-Video Dynamics Models</a> (2026)</li>\n</ul>\n Please give a thumbs up to this comment if you found it helpful!\n If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code>@librarian-bot recommend</code>\n","updatedAt":"2026-07-02T01:46:51.060Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":372,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7257076501846313},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.29315","authors":[{"_id":"6a45465f4f1dd35e48fb8d57","name":"Abhranil Chandra","hidden":false},{"_id":"6a45465f4f1dd35e48fb8d58","name":"Sankaran Vaidyanathan","hidden":false},{"_id":"6a45465f4f1dd35e48fb8d59","name":"Utsav Dhanuka","hidden":false},{"_id":"6a45465f4f1dd35e48fb8d5a","name":"Varun Gandhi","hidden":false},{"_id":"6a45465f4f1dd35e48fb8d5b","name":"Scott Niekum","hidden":false}],"publishedAt":"2026-06-28T00:00:00.000Z","submittedOnDailyAt":"2026-07-01T00:00:00.000Z","title":"Hierarchical Experimentalist Agents","submittedOnDailyBy":{"_id":"6154ee656181394cc00cb990","avatarUrl":"/avatars/25d3b6911c5991b1869f5d76ca2c4069.svg","isPro":true,"fullname":"Abhranil Chandra","user":"abhranil14","type":"user","name":"abhranil14"},"summary":"Large language models (LLMs) are increasingly used to take actions in the real world and support human decision-making, yet most agents rely on parametric knowledge, fixed post-training data, retrieval, or search. This paradigm breaks down in novel domains and for sophisticated queries that cannot be answered from prior knowledge alone. Knowing the laws of physics, for instance, does not by itself enable LLMs to answer queries or complete long-horizon tasks in a complex physical system. To address this, we introduce Hierarchical Experimentalist Agents (HExA), an in-context self-improvement framework to learn from active experimentation. HExA iteratively designs and refines query-relevant experiments, learns a reusable library of composable skills from experience, and integrates experimental evidence to answer queries or take actions. HExA is training-free, compatible with any black-box model, and does not require external supervision, oracles, or offline data. To evaluate active experimentation, we introduce Interphyre, a tool-calling benchmark built on the PHYRE 2D procedural physics environment, where agents propose interventions and test hypotheses through simulation APIs. Experiments show that current LLM agents struggle in these settings, especially on the hardest levels of Interphyre. Claude Sonnet 4.6 achieves only 2% success, while HExA improves the same model to up to 77% success. HExA also improves open-weight models and outperforms agentic baselines such as ReAct and Reflexion. Moreover, using only skills learned from easier levels and transferred without active experimentation, HExA achieves 44% success, demonstrating the reusability and generalization of its learned skills. Overall, HExA shows that learning through active experimentation can help agents discover useful knowledge, acquire reusable skills, and make efficient progress on novel long-horizon tasks.","upvotes":1,"discussionId":"6a45465f4f1dd35e48fb8d5c","projectPage":"https://general-exp-3-continual-learning-agent.github.io/HeXA/","githubRepo":"https://github.com/General-Exp-3-Continual-Learning-Agent/HeXA-Hierarchical-Experimentalist-Agents","githubRepoAddedBy":"user","ai_summary":"HExA enables large language models to improve through active experimentation and skill learning in novel domains without requiring training or external supervision.","ai_keywords":["Hierarchical Experimentalist Agents","active experimentation","self-improvement framework","query-relevant experiments","reusable library of composable skills","experimental evidence","Interphyre","PHYRE 2D procedural physics environment","tool-calling benchmark","simulation APIs","ReAct","Reflexion"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":0,"organization":{"_id":"6216d16db64eee25bf8f22dd","name":"umass","fullname":"University of Massachusetts Amherst","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1645662504549-6216cfcd6a99db28e0b3155a.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63ac5701c21e60a3e9b58aa7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63ac5701c21e60a3e9b58aa7/g6EX7diOpuA94R2ab-rZC.png","isPro":true,"fullname":"Dipankar Sarkar","user":"dipankarsarkar","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6216d16db64eee25bf8f22dd","name":"umass","fullname":"University of Massachusetts Amherst","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1645662504549-6216cfcd6a99db28e0b3155a.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.29315.md","query":{}}">

Papers

arxiv:2606.29315

Hierarchical Experimentalist Agents

Published on Jun 28

· Submitted by

Abhranil Chandra on Jul 1

University of Massachusetts Amherst

Upvote

Authors:

Abstract

HExA enables large language models to improve through active experimentation and skill learning in novel domains without requiring training or external supervision.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

View arXiv page View PDF Project page GitHub 0 Add to collection

Community

abhranil14

Paper submitter about 9 hours ago

librarian-bot

14 minutes ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.29315

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.29315 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.29315 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.29315 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Hierarchical Experimentalist Agents

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers