News / #agents Tag Agents + tool use 500 articles archived under #agents · RSS Sign in to follow llama.cpp releases dev-tools 15d ago b9691 ggml-cpu: Conditionally enable power11 backend based on compiler support ( #24687 ) ggml: Conditionally enable power11 backend based on compiler support Guard POWER11 backend creation behind a compiler flag check for -mcpu=power11. This avoids build failures on current GCC/Clang… 14 r/LocalLLaMA community 15d ago Lemonade v10.8: auto memory management, cloud offload, Omni improvements, and call your local models as MCP tools v10.8 is out, so here's a project update on what landed. This was a 20-contributor release in just 7 days! Smarter memory and context management Dynamic VRAM management now auto-unloads idle models and downsizes their KV-cache to reclaim GPU memory on the fly, plus model pinning… 27 Ars Technica — AI news-outlet 15d ago AI coding agents taught robots how to install GPUs and cut zip-ties NVIDIA’s self-improvement program for robots enlists teams of AI coding agents. 13 TechCrunch — AI news-outlet 15d ago NEA’s Tiffany Luck on AI IPOs, personal agents, and the ROI reckoning Tokenmaxxing was the hottest trend in Silicon Valley earlier this year, with CEOs encouraging employees to push AI usage as far as it would go. Then the bill came due. Uber reportedly blew through its annual AI budget in a few months, some companies… 23 r/LocalLLaMA community 15d ago GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine? arXiv : https://arxiv.org/abs/2606.17861 Full Paper : https://arxiv.org/pdf/2606.17861 HuggingFace : https://huggingface.co/papers/2606.17861 GitHub : https://github.com/tongxuluo/gamecraft-bench Project : https://tongxuluo.github.io/gamecraft-bench-website/ I see big/large… 20 llama.cpp releases dev-tools 15d ago b9685 [SYCL] add dev2dev memcpy by SYCL API ( #24476 ) add dev2dev memcpy by SYCL API mv GGML_SYCL_DEV2DEV_MEMCPY to runntime table update the detect method for p2p comm fix the erro created during fix confilct Co-authored-by: Neo Zhang macOS/iOS: macOS Apple Silicon (arm64) macOS… 33 Vercel — AI dev-tools 15d ago Vercel Ship 2026 recap For a decade, Vercel has shaped how the web gets built. Now, we’re doing the same for agents. The companies that win the next decade will build on infrastructure designed for agents from the start, and over 2,500 people gathered in London this week to do just that at Vercel Ship… 20 r/LocalLLaMA community 15d ago GLM-5.2 is a win for local AI I know GLM 5.2's massive 753B footprint means none of us are running it at home without an enterprise cluster, but having a true frontier-level, MIT-licensed coding agent out in the wild makes me optimistic. The distillation potential here is massive. Once the community starts… 38 r/LocalLLaMA community 15d ago Headless screenshot loops let a local 30B agent finish a raytraced FPS demo in pure C Some background so this is honest. Over the past few months I ran a lot of oneshot experiments with single file three.js games. Minecraft clones, that kind of thing. I picked those on purpose because they sit deep in the training data and are trivial to debug by eye. The goal… 37 Hugging Face Daily Papers research 15d ago Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion Abstract DR-DCI framework combines retrieval with direct corpus interaction by dynamically pulling relevant documents into a local workspace, enabling scalable and efficient agentic search across large corpora. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Agentic search over… 27 Hugging Face Daily Papers research 15d ago Visual-Seeker: Towards Visual-Native Multimodal Agentic Search via Active Visual Reasoning Abstract Visual-Seeker enables visual-native multimodal deep search through active visual reasoning, outperforming proprietary models on real-world web environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multimodal large language models (MLLMs) have demonstrated… 25 llama.cpp releases dev-tools 15d ago b9674 SYCL: fix use-after-free bug with async memcpy in MoE prefill ( #24676 ) SYCL: fix a bug with async memcpy make mmid_row_mapping_host persistent comment on stream->wait Apply suggestion from @sanmai Apply suggestion from @sanmai Apply suggestion from @sanmai macOS/iOS: macOS… 34 Hugging Face official-blog 15d ago From the Hugging Face Hub to robot hardware with Strands Agents and LeRobot Back to Articles a]:hidden"> From the Hugging Face Hub to robot hardware with Strands Agents and LeRobot Enterprise Article Published June 17, 2026 Upvote 4 Sundar Raghavan rsundaraws amazon Cagatay Cali cagataydev amazon A walkthrough of the LeRobot integration in Strands… 28 arXiv — Machine Learning research 15d ago ProCUA-SFT Technical Report arXiv:2606.17321v1 Announce Type: new Abstract: Training computer-use agents (CUAs) -- models that interact with graphical desktops through screenshots and keyboard/mouse actions -- requires large-scale, diverse trajectory data collected in full desktop environments. The largest… 9 arXiv — Machine Learning research 15d ago Offline Preference-Based Trajectory Evaluation arXiv:2606.17541v1 Announce Type: new Abstract: Offline evaluation of agentic systems often collapses trajectories to terminal success, discarding information about partial progress and inducing widespread ties, creating substantial statistical inefficiency by reducing effective… 20 arXiv — NLP / Computation & Language research 15d ago EnvRL: Learn from Environment Dynamics in Agentic Reinforcement Learning arXiv:2606.17680v1 Announce Type: cross Abstract: Reinforcement learning (RL) has emerged as a powerful paradigm for training Large Language Models (LLMs) as agents. However, conventional RL methods for long-horizon agentic tasks often struggle with sparse outcome rewards.… 15 arXiv — NLP / Computation & Language research 15d ago MemSlides: A Hierarchical Memory Driven Agent Framework for Personalized Slide Generation with Multi-turn Local Revision arXiv:2606.17162v1 Announce Type: new Abstract: Personalized presentation generation requires more than conditioning on a current prompt or template: agents must preserve stable user preferences across tasks, retain newly introduced preferences and constraints during multi-turn… 25 arXiv — NLP / Computation & Language research 15d ago PromptMN: Pseudo Prompting Language arXiv:2606.17164v1 Announce Type: new Abstract: Prompting has become the primary interface between humans and generative AI, yet many natural language prompts remain fragile: roles, goals, constraints, and expected outputs are often buried in prose or left implicit. In agentic… 13 arXiv — NLP / Computation & Language research 15d ago Scaling Enterprise Agent Routing: Degradation, Diagnosis, and Recovery arXiv:2606.17519v1 Announce Type: new Abstract: Production LLM assistants route user requests to growing libraries of specialized tools, but how does routing accuracy degrade as the catalog scales? We study single-step routing on a 110-agent, 584-tool catalog from a deployed… 14 arXiv — NLP / Computation & Language research 15d ago OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation arXiv:2606.17628v1 Announce Type: new Abstract: Memory has become a standard substrate for self-evolving agents, yet retaining experience is not the same as learning how to evolve through it. Existing memory agents can store trajectories, retrieve reflections, or accumulate… 37 arXiv — NLP / Computation & Language research 15d ago Environment-Grounded Automated Prompt Optimization for LLM Game Agents arXiv:2606.17838v1 Announce Type: new Abstract: LLM agents in interactive environments are highly sensitive to their prompts, yet prompt engineering remains a manual, task-specific process. We introduce an automated prompt optimization framework for LLM agents that decomposes… 20 arXiv — NLP / Computation & Language research 15d ago GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine? arXiv:2606.17861v1 Announce Type: new Abstract: Game generation is an emerging application of coding agents, requiring models to transform natural-language specifications into playable interactive systems. Unlike traditional coding tasks, game generation takes place within a… 28 arXiv — NLP / Computation & Language research 15d ago Compositional Skill Routing for LLM Agents: Decompose, Retrieve, and Compose arXiv:2606.18051v1 Announce Type: new Abstract: LLM agents increasingly rely on external skills -- reusable tool specifications -- but real-world tasks often require composing multiple skills, not just selecting one. We formalize this as the Compositional Skill Routing problem:… 8 arXiv — NLP / Computation & Language research 15d ago RubricsTree: Scalable and Evolving Open-Ended Evaluation of Personal Health Agents across Health Memory and Medical Skills arXiv:2606.18203v1 Announce Type: new Abstract: The LLM-empowered personal health agents with user health (sensor) metrics have offered a promising pathway to alleviate global disparities in healthcare access. However, large-scale clinical deployment remains constrained by an… 28 arXiv — NLP / Computation & Language research 15d ago ReproRepo: Scaling Reproducibility Audits with GitHub Repository Issues arXiv:2606.18237v1 Announce Type: new Abstract: Reproducing research results from papers and released code is central to scientific progress. Existing works have introduced benchmarks to evaluate whether LLM agents can assist with reproducibility, but they are difficult to scale… 36 arXiv — NLP / Computation & Language research 15d ago Securing Multi-Agent GIS Systems: Risk Evaluation and Prompt Hardening Optimization arXiv:2606.17092v1 Announce Type: cross Abstract: Agentic systems are increasingly integrated with geographic information systems (GIS), where multi-agent coordination enables complex conversational and spatial analysis but introduces security risks. This work presents a… 8 arXiv — NLP / Computation & Language research 15d ago Visuals Lie, Consistency Speaks: Disentangling Spatial Attention from Reliability in Vision-Language Models arXiv:2606.17389v1 Announce Type: cross Abstract: Multimodal Foundation Models are increasingly used as reasoning agents, making reliability, knowing when a model may hallucinate, critical. A common intuition, which we call the Attention-Confidence Assumption, holds that… 24 arXiv — NLP / Computation & Language research 15d ago PARSE: Provenance-Aware Retrieval Sanitization for Professional Domain LLM Agents arXiv:2606.17467v1 Announce Type: cross Abstract: Prompt injection defenses evaluated on synthetic benchmarks do not generalize to real enterprise documents, which are longer, denser, and interleave legitimate authority language with factual content. We demonstrate this gap with… 17 arXiv — NLP / Computation & Language research 15d ago Beyond Domains: Reusing Web Skills via Transferable Interaction Patterns arXiv:2606.17645v1 Announce Type: cross Abstract: Large language model (LLM) web agents are usually deployed as tool callers: each turn, the model reads a fresh page observation and emits one structured tool action. When every action is a low-level primitive, horizons grow… 32 arXiv — NLP / Computation & Language research 15d ago EComAgentBench: Benchmarking Shopping Agents on Long-Horizon Tasks with Distributed Hidden Intent arXiv:2606.17698v1 Announce Type: cross Abstract: As LLM-based shopping agents enter production, existing benchmarks fail to capture how a shopper's requirements arrive: stated implicitly in the query, recorded in a profile, or revealed only when the right question is asked.… 24 arXiv — NLP / Computation & Language research 15d ago Position: Coding Benchmarks Are Misaligned with Agentic Software Engineering arXiv:2606.17799v1 Announce Type: cross Abstract: Coding agents have become a major mode of software engineering, but the benchmarks we use to compare them were designed in a pre-agent era: they collapse model, harness, and environment into a single end-to-end score, typically… 33 arXiv — NLP / Computation & Language research 15d ago A Framework for Evaluating Agentic Skills at Scale arXiv:2606.17819v1 Announce Type: cross Abstract: Agent skills -- structured, reusable knowledge artifacts that augment LLM agent capabilities -- have been rapidly adopted in industry, yet their cross-domain impact and use across commercial and open-source models remain… 10 arXiv — NLP / Computation & Language research 15d ago ProvenanceGuard: Source-Aware Factuality Verification for MCP-Based LLM Agents arXiv:2606.18037v1 Announce Type: cross Abstract: Tool-using LLM agents increasingly use the Model Context Protocol (MCP) to answer from heterogeneous evidence sources, including search, APIs, databases, clinical records, and formulary tools. Standard factuality metrics usually… 27 arXiv — NLP / Computation & Language research 15d ago PseudoBench: Measuring How Agentic Auto-Research Fuels Pseudoscience arXiv:2606.18060v1 Announce Type: cross Abstract: As Large Language Model based agents enter autonomous scientific research, their ability to resist pseudoscience becomes increasingly important. Otherwise, such systems may rapidly generate plausible yet misleading studies that… 13 arXiv — NLP / Computation & Language research 15d ago Your AI Travel Agent Would Book You a Bullfight: An Agentic Benchmark for Implicit Animal Welfare in Frontier AI Models arXiv:2606.18142v1 Announce Type: cross Abstract: AI agents are moving from advisors to actors, booking travel, planning menus, and running procurement on behalf of users. Existing benchmarks for AI and animal welfare evaluate model text responses to question-answer prompts,… 21 arXiv — NLP / Computation & Language research 15d ago Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning arXiv:2601.03872v2 Announce Type: replace Abstract: The integration of large language models (LLMs) with external tools has significantly expanded the capabilities of AI agents. However, as the diversity of both LLMs and tools increases, selecting the optimal model-tool… 27 arXiv — NLP / Computation & Language research 15d ago LVLMs and Humans Ground Differently in Referential Communication arXiv:2601.19792v4 Announce Type: replace Abstract: For generative AI agents to partner effectively with human users, the ability to accurately predict human intent is critical. But this ability to collaborate remains limited by a critical deficit: an inability to model common… 9 Vercel — AI dev-tools 15d ago Introducing Vercel Connect Giving your agents access to your tools, data, and services is what makes them useful. As agents perform deeper work across systems, authenticating and authorizing that access becomes central to your application architecture. Today, agent access is usually granted through… 21 Vercel — AI dev-tools 15d ago Introducing eve Today, we are proud to introduce eve , an open-source agent framework for building, running, and scaling agents. eve is designed around the idea that building an agent should mean defining what it does without assembling all of the pieces that it needs to run in production.… 15 Hugging Face Daily Papers research 15d ago MemSlides: A Hierarchical Memory Driven Agent Framework for Personalized Slide Generation with Multi-turn Local Revision Abstract MemSlides presents a hierarchical memory framework for personalized presentation agents that separates long-term user profiles, working memory for session constraints, and tool memory for reusable execution experiences to enable stable personalization and reliable local… 21 Hugging Face Daily Papers research 15d ago ProCUA-SFT Technical Report Abstract Training computer-use agents using a large-scale synthetic dataset with automated task generation and verification achieves significantly improved performance on desktop interaction benchmarks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Training computer-use agents… 4 Hugging Face Daily Papers research 15d ago OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation Abstract OPD-Evolver is a self-evolving agent framework that combines slow-fast co-evolution with on-policy self-distillation to enhance memory management and policy learning across multiple domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Memory has become a standard… 28 Hugging Face Daily Papers research 15d ago Beyond Monolingual Deep Research: Evaluating Agents and Retrievers with Cross-Lingual BrowseComp-Plus Abstract Research agents face significant challenges when evidence is in a different language than the query, with performance degrading even when gold evidence is provided directly. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Deep research agents are increasingly evaluated on… 28 Hugging Face Daily Papers research 15d ago GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine? Abstract End-to-end game generation presents significant challenges for coding agents, requiring them to create complete playable games from natural language descriptions while meeting specific evaluation criteria for engine grounding, artifact completeness, and interactive… 31 Hugging Face Daily Papers research 15d ago LectūraAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted Learning and Embodied Teaching Abstract LectūraAgents is a multi-agent framework that enables personalized learning through adaptive embodied teaching by mimicking professor-student interactions and generating coordinated teaching actions aligned with learner profiles. Generated by… 9 Vercel — AI dev-tools 15d ago Introducing eve, an open-source agent framework eve is now available in public preview. eve is an open-source framework for building, running, and scaling agents. An agent is just a directory of files, and production comes built in: Durable execution Sandboxed compute Human-in-the-loop approvals Subagents Evals The smallest… 31 Hugging Face official-blog 15d ago Agentic Resource Discovery: Let agents search Back to Articles a]:hidden"> Agentic Resource Discovery: Let agents search for tools, skills, and other agents. Published June 17, 2026 Update on GitHub Upvote - ben burtenshaw burtenshaw shaun smith evalstate If you build with agents today, you probably know three protocols.… 15 Vercel — AI dev-tools 15d ago CLI deployment limits removed We've removed CLI-specific deployment limits, making it easier to deploy from local machine and external CI/CD pipelines with instant feedback. Teams and AI agents can now deploy at the pace their workflows demand. Learn more about limits in the Documentation . Read more 5 Vercel — AI dev-tools 15d ago Vercel for Enterprise Apps and Agents Today we are introducing Vercel for Enterprise Apps and Agents , a platform that gives your entire company the ability to ship with AI safely, behind your access and security boundaries. Over the past year, employees across Vercel shipped hundreds of agents and internal apps.… 34 NVIDIA Developer Blog official-blog 15d ago Building AI Agents for AR Glasses and XR Devices with NVIDIA XR AI Developers building for AR glasses and wearable devices face an infrastructure gap. The hardware is ready, but creating AI experiences requires integrating live... 33 Page 9 of 10 · 500 articles ← Newer Older →