News / #agents Tag Agents + tool use 500 articles archived under #agents · RSS Sign in to follow arXiv — Machine Learning research 2h ago Play Like Champions: Counterfactual Feedback Generation in Latent Space arXiv:2607.00190v1 Announce Type: new Abstract: Recent advances in reinforcement learning have produced superhuman agents across a wide range of competitive games. As a byproduct, researchers have begun studying how these agents play, extracting behavioral representations,… 37 arXiv — NLP / Computation & Language research 2h ago EPC: A Standardized Protocol for Measuring Evaluator Preference Dynamics in LLM Agent Systems arXiv:2607.00297v1 Announce Type: cross Abstract: When LLM agents use evaluator feedback to adapt their behavior in closed loops, evaluator biases propagate through the agent's strategy distribution -- a phenomenon known as evaluator preference coupling. Prior work has… 37 arXiv — Machine Learning research 2h ago Distributed Online Bandit Submodular Maximization with Bounded Sampling Violations arXiv:2607.00680v1 Announce Type: new Abstract: We study distributed online submodular maximization under partition matroid constraints, in which multiple agents select a limited number of actions from their own subsets sequentially to maximize the cumulative value of a sequence… 30 arXiv — Machine Learning research 2h ago Task-Relevant Representation Decoupling for Visual Reinforcement Learning Generalization arXiv:2607.00796v1 Announce Type: new Abstract: Visual Reinforcement Learning (VRL) has achieved considerable success in solving control tasks. However, generalizing learned policies to new environments remains a major challenge, as agents often overfit to task-irrelevant… 32 arXiv — Machine Learning research 2h ago Local Motion Matters: A Deconstruct-Recompose Paradigm for Reinforcement Learning Pre-training from Videos arXiv:2607.00808v1 Announce Type: new Abstract: Pre-training on large-scale videos to improve reinforcement learning efficiency is promising yet remains challenging. Existing methods typically treat the agent as an indivisible entity, modeling motion patterns globally. Such… 8 arXiv — NLP / Computation & Language research 2h ago TRACE: State-Aware Query Processing over Temporal Evidence Graphs for Conversational Data arXiv:2607.00339v1 Announce Type: new Abstract: Conversational data is increasingly used as a persistent source of user state for long-running assistants and AI agents. However, querying this data remains challenging because conversations naturally evolve: plans are revised,… 8 arXiv — NLP / Computation & Language research 2h ago A Task-State Representation for Long-Horizon Mobile GUI Agents arXiv:2607.00502v1 Announce Type: new Abstract: While long-horizon mobile GUI agents typically rely on thought-action-observation loops, they struggle to separate persistent task states from transient screen observations. As execution histories grow, this entanglement imposes a… 11 arXiv — NLP / Computation & Language research 2h ago Multi-Turn Agentic Scientific Literature Search via Workflow Induction arXiv:2607.00597v1 Announce Type: new Abstract: Scientific literature search often requires more than retrieving papers from a single query: users' intents are underspecified, preference-dependent, and evolve through interaction. Existing search agents typically rely on fixed… 26 arXiv — NLP / Computation & Language research 2h ago Behavior-Adaptive Conversational Agents: Toward a Fluid Personality Framework arXiv:2607.01034v1 Announce Type: new Abstract: Large language model (LLM)-based conversational agents (CAs) are now ubiquitous, creating new opportunities for AI-mediated behavior change. Their capacity to project nuanced personalities and adopt diverse metaphorical roles… 38 arXiv — NLP / Computation & Language research 2h ago Conversable Complexity: Agentic LLM Collectives as Interpretable Substrates arXiv:2607.01047v1 Announce Type: new Abstract: Complexity and interpretability rarely coincide: systems rich enough for complex behaviours to emerge are usually too opaque to question, while transparent ones are too simple for anything complex to emerge. A single large language… 33 arXiv — NLP / Computation & Language research 2h ago Learning User-Aware Recall: Personalized Retrieval in Long-Term Conversational Memory arXiv:2607.00017v1 Announce Type: cross Abstract: Long-term conversational agents are expected to remember past interactions, but memory is useful only when the right evidence is recalled for the right user. Existing memory-augmented LLM agents have made progress in building… 30 arXiv — NLP / Computation & Language research 2h ago From Signals to Structure: How Memory Architecture Drives Language Emergence in LLM Agents arXiv:2607.00233v1 Announce Type: cross Abstract: How do two agents invent a shared language from scratch? In a Lewis signaling game, a sender and receiver must coordinate on a code using only their interaction history. We study five memory architectures across varying channel… 26 arXiv — NLP / Computation & Language research 2h ago When Classic Cache Policies Fail: Learning-Augmented Replacement for Semantic Retrieval Buffers arXiv:2607.00394v1 Announce Type: cross Abstract: LLM agents increasingly rely on retrieval buffers to store and reuse past experience, yet the cache management policies governing these buffers remain largely ad-hoc. We formalize this as an online semantic cache replacement… 22 arXiv — NLP / Computation & Language research 2h ago Self-Evolving Agents with Anytime-Valid Certificates arXiv:2607.00871v1 Announce Type: cross Abstract: Self-evolving agents violate the assumption behind most learning-theoretic guarantees: the data, evaluator, components, and hypothesis space are produced by the policy being updated. We present \textbf{SEA}, an architecture that… 27 arXiv — NLP / Computation & Language research 2h ago Agentic generation of verifiable rules for deterministic, self-expanding reaction classification arXiv:2607.01061v1 Announce Type: cross Abstract: Computer-assisted synthesis planning breaks target molecules into accessible precursors using large libraries of reaction rules that assign each transformation a deterministic, interpretable label. But chemistry is long-tailed,… 17 arXiv — NLP / Computation & Language research 2h ago OpenReward: Learning to Reward Long-form Agentic Tasks via Reinforcement Learning arXiv:2510.24636v3 Announce Type: replace Abstract: Reward models (RMs) have become essential for aligning large language models (LLMs), serving as scalable proxies for human evaluation in both training and inference. However, existing RMs struggle on knowledge-intensive and… 34 arXiv — NLP / Computation & Language research 2h ago Thinking While Speaking: Inference-Time Knowledge Transfer for Responsive and Intelligent Conversational Voice Agents arXiv:2511.07397v3 Announce Type: replace Abstract: Voice agents face a fundamental tension: the reasoning, retrieval, and tool use that make foundation models capable are iterative and slow, while conversational interaction demands responses on a millisecond timescale. Smaller,… 22 r/MachineLearning community 2h ago SentryCode: Real-time Auditor + Honeytokens for AI Coding Agents [P] In light of recent privacy concerns arising from local AI coding agents performing telemetry, environmental scanning, and hidden cue fingerprinting, I've open-sourced SentryCode—a kernel-level behavior auditing tool. It logs file/network/cue activity, uses honeypot tokens for… 12 r/LocalLLaMA community 5h ago I added MTP to local SoTA Agentic Coding Model Ornith 35B FP8 E4M3 Just wanted to share that I was looking for an optimal way to run Ornith 35B in FP8 with E4M3 and MTP with vLLM but there was no out-of-the-box model with MTP drafter support. So I grafted this new model! It's 18% faster than without MTP and the drafter acceptance rate is not… 31 Latent.Space news-outlet 6h ago Autoresearch: The feedback loop behind self-improving agents Introspection co-founder Roland Gavrilescu explains autoresearch, agent “recipes,” self-improving loops, and why humans remain central to the software factory. 15 r/LocalLLaMA community 9h ago ZCode: New Agentic Code Editor from the Makers of GLM   submitted by   /u/johnnyApplePRNG [link]   [comments] 16 Latent.Space news-outlet 11h ago How Cursor deploys AI inside the enterprise Cursor's Pauline Brunet explains how her team of Forward Deployed Engineers help organizations implement agents — essentially setting up software factories. 35 r/LocalLLaMA community 11h ago Open benchmark: how well can multimodal LLMs read a calendar week-view from a screenshot? Humans ~99%, Q4 local models..... Some backstory I've been working on my local agent (openclaw), and I wanted to give it the skill to reconstruct calendar entries from a photo of the screen. I couldn't get at the calendar through an API (long story), so a photo was the only low-friction way to export the data.… 16 Hugging Face Daily Papers research 11h ago TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning Abstract TRIAGE introduces a role-typed credit assignment framework that enhances agentic reinforcement learning by providing more nuanced credit assignment than standard GRPO methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Agentic reinforcement learning requires assigning… 26 TechCrunch — AI news-outlet 12h ago Cloudflare’s new policy pushes AI companies to pay for publishers’ content Cloudflare is giving AI companies until September 15 to separate web crawlers used for search from those used for AI training and agents, or risk being blocked by default on many publisher sites. 16 Hugging Face Daily Papers research 12h ago SWE-INTERACT: Reimagining SWE Benchmarks as User-Driven Long-Horizon Coding Sessions Abstract SWE-Interact presents a testbed that evaluates coding agents in realistic multi-turn, user-driven software engineering scenarios, revealing significant gaps between single-turn performance and interactive task completion. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We… 6 r/LocalLLaMA community 13h ago Plurality Released: fully Free and Open Source AI agents/chatbot platform for local AI Hello everyone! Some of you might recognize my user from the work I have done on Cosmos Cloud, but today I am here to talk to you about an entirely different project: Plurality. https://github.com/azukaar/plurality Plurality has been in development for a bit more than a year and… 22 NVIDIA Developer Blog official-blog 13h ago Mastering Agentic Techniques: AI Agent Reinforcement Learning Reinforcement learning (RL) is central to aligning language models, from reinforcement learning with human feedback (RLHF) within AI assistants to newer... 38 Hugging Face Daily Papers research 13h ago Hierarchical Experimentalist Agents Abstract HExA enables large language models to improve through active experimentation and skill learning in novel domains without requiring training or external supervision. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models (LLMs) are increasingly used to take… 24 Hugging Face Daily Papers research 14h ago Does VLA Even Know the Basics? Measuring Commonsense and World Knowledge Retention in Vision-Language-Action Models Abstract Act2Answer protocol evaluates embodied vision-language-action models by having agents answer questions through physical actions, revealing knowledge retention and generalization patterns across different semantic categories. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 35 Hugging Face Daily Papers research 14h ago Lexical Consensus: Grounded Word Learning and Shared Meaning in Artificial Agents Abstract Grounded word learning experiments using visual embeddings and lexical learners reveal that perceptual distance, rather than semantic relatedness, determines acquisition success, with distinct patterns in naming and retrieval performance. Generated by… 34 r/LocalLLaMA community 14h ago Open Models - June 2026 After overwhelming April , OK May , here's June. Yeah, Graph has only less items. Because we got other items here last month. Finetunes : Nex-N2 Ornith-1.0 Agents-A1 Holo3.1 Tmax-27b MusaCoder-27B VibeThinker-3B NVFP4 from NVIDIA for below models :… 8 TechCrunch — AI news-outlet 15h ago Gemini Spark, Google’s agentic assistant, is now available on Mac Google's 24/7 agentic assistant, Gemini Spark, comes to Mac alongside other improvements, like real-time tracking and support for more apps. 35 r/LocalLLaMA community 16h ago Agent execution visualizer I've seen projects which stream tool use status and subagent generation, and represented it with a nice little visual based on the tool being used, etc. It would be pretty cool to pair this with some live model visualisations like a QKV heatmap across attention heads. Not for… 28 Hugging Face Daily Papers research 17h ago QVal: Cheaply Evaluating Dense Supervision Signals for Long-Horizon LLM Agents Abstract A testbed called QVal is introduced for evaluating dense supervision signals in long-horizon LLM agent tasks by measuring how well method scores align with Q-values, enabling fair comparison of different supervision approaches without training. Generated by… 22 r/LocalLLaMA community 18h ago Hister: Give Your AI Assistant a Private Memory I have been working on Hister, a self hosted search engine that automatically indexes pages you visit, local files, and documentation, then keeps them searchable with stored offline previews. It also exposes an MCP endpoint, so local AI assistants can search your own indexed… 5 Hugging Face Daily Papers research 19h ago Managing Procedural Memory in LLM Agents: Control, Adaptation, and Evaluation Abstract Procedural memory enhances LLM agents on workplace tasks through skill transfer across roles and models, with varying generalization capabilities affecting deployment strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Procedural memory is increasingly used to… 22 r/MachineLearning community 20h ago A system-level approach to prompt injection: separating instruction and data channels in LLM agents [P] Prompt injection has emerged as one of the most persistent failure modes in tool-using LLM systems, particularly in agentic workflows where models interact with external data sources. Most mitigation strategies focus on input filtering or model-side alignment, but these… 9 Hugging Face Daily Papers research 20h ago SkillHone: A Harness for Continual Agent Skill Evolution Through Persistent Decision History Abstract SkillHone enables continuous evolution of agent skills by maintaining persistent decision histories and incorporating practice feedback for improved performance across research and tool-mediated analysis tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Agent skills… 35 Hugging Face Daily Papers research 20h ago DataEvolver: Self-Evolving Multi-Agent Data Construction for Text-Rich Image Generation Abstract DataEvolver is a self-evolving multi-agent framework that improves text-rich image generation by leveraging feedback from rejected samples to iteratively enhance data quality. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Text-rich image generation is one of the most… 11 Hugging Face Daily Papers research 21h ago Xiaomi-GUI-0 Technical Report Abstract A native multimodal GUI agent trained in real-device environments demonstrates superior performance and stability compared to traditional benchmark-based approaches. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Graphical user interface (GUI) agents build on… 7 r/LocalLLaMA community 23h ago Ketch - Best Search Tool for local models recently I wrote a blog post, to find which search tool will be best for the pi coding agent paired with local models (currently I use Qwen3.6 35B) Before that I were using firecrawl or brave-search, but found them very decent, so I went to SearXNG, which is fine, but lacks some… 38 Latent.Space news-outlet 1d ago AIEWF Daily Dispatch: Loops, Software Factories & Forward Deployed Engineers On Tuesday at the AI Engineer World's Fair, there was a lot of talk about loops, agent engineering, and the emergence of software factories. Also a hot topic: open models. 34 arXiv — Machine Learning research 1d ago PPT-Eval: A Benchmark for Computer-Use Agents on PowerPoint Tasks arXiv:2606.31154v1 Announce Type: new Abstract: Creating and editing slides is a rich, multimodal activity that is ubiquitous in professional and educational settings, making it an ideal testbed for real-world computer-use agents. Microsoft PowerPoint is among the most widely… 25 arXiv — Machine Learning research 1d ago Expected Gain-based Escalation in Vertical Federated Learning arXiv:2606.31331v1 Announce Type: new Abstract: Collaborative inference can improve predictive performance by integrating complementary information across agents, but applying collaborative fusion to every sample can incur unnecessary communication and computational overhead.… 17 arXiv — NLP / Computation & Language research 1d ago Calibrating the Evaluator: Does Probability Calibration Mitigate Preference Coupling in LLM Agent Feedback Loops? arXiv:2606.31371v1 Announce Type: cross Abstract: When large language model (LLM) agents adapt their behavior through evaluator feedback, systematic evaluator biases propagate into the agent's learned strategy distribution - a phenomenon termed evaluator preference coupling.… 38 arXiv — Machine Learning research 1d ago ECHO: Prune to act, trace to learn with selective turn memory in agentic RL arXiv:2606.31650v1 Announce Type: new Abstract: Long-horizon language agents must repeatedly interact with tools, accumulate evidence, and make decisions under bounded context windows. Existing context-management methods make such rollouts feasible by truncating distant history,… 12 arXiv — Machine Learning research 1d ago TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning arXiv:2606.32017v1 Announce Type: new Abstract: Agentic reinforcement learning requires assigning credit to environment-facing actions such as searches, clicks, edits, navigation commands, and object interactions. Standard GRPO uses the final verifier outcome as a uniform… 13 arXiv — NLP / Computation & Language research 1d ago QVal: Cheaply Evaluating Dense Supervision Signals for Long-Horizon LLM Agents arXiv:2606.32034v1 Announce Type: cross Abstract: LLM agents increasingly act over long horizons, where a single trajectory can contain hundreds or thousands of actions. In these settings, outcome-only rewards provide too sparse guidance, failing to inform the model about the… 36 arXiv — NLP / Computation & Language research 1d ago A Single Rewrite Suffices: Empirical Lessons from Production Skill Description Optimization arXiv:2606.30775v1 Announce Type: new Abstract: Enterprise AI agents route user queries to specialized skills by matching queries against natural language skill descriptions. When two skills share overlapping descriptions, the routing LLM misroutes queries, a failure we term… 25 Page 1 of 10 · 500 articles Older →