Tag

Developer Tool

500 articles archived under #developer-tool · RSS

arXiv — NLP / Computation & Language research 6d ago

Somatic in the East, Psychological in the West?: Investigating Clinically-Grounded Cross-Cultural Depression Symptom Expression in LLMs

arXiv:2508.03247v2 Announce Type: replace Abstract: Prior clinical psychology research shows that Western individuals with depression tend to report psychological symptoms, while Eastern individuals report somatic ones. We test whether Large Language Models (LLMs), which are…

5
Hugging Face Daily Papers research 6d ago

GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents

Abstract Computer-use agents can execute software tasks through either graphical interfaces or programmatic command interfaces, but existing evaluations confound interaction modality with differences in tasks, initial states, verifiers, and permitted actions. We introduce a…

7
r/LocalLLaMA community 6d ago

Good YouTube channels for local LLM news and development?

Sometimes I'd prefer chilling on the couch and learning instead of reading. I've searched on YouTube and most seem like clickbait and slop. Thanks   submitted by   /u/6jarjar6 [link]   [comments]

5
r/LocalLLaMA community 6d ago

Which model for technical documentation?

Looking to create high level / low level designs (software), based on existing templates/examples, cross reference code, use mcp to download confluence/jira data - also plug into agentic ‘coding’ frameworks opencode . I mostly use opus 3.6 with Kiro-cli , but I want my data…

32
Hacker News — AI on Front Page community 6d ago

Show HN: OpenKnowledge – open source AI-first alternative to Obsidian/Notion

Hi HN, Nick here. We’re launching OpenKnowledge ( https://openknowledge.ai/ ), a “what you see is what you get” markdown editor that has direct integrations with Claude, Codex, and other agents. Available as MacOS app or Web UI+CLI. Fully free/local and OSS. We built this…

20
Vercel — AI dev-tools 6d ago

AI SDK 7

AI SDK, with over 16 million weekly downloads, is the TypeScript SDK for building AI applications, features, frameworks, and agents across any model provider. It's the same layer eve , Vercel's open-source agent framework, is built on. AI SDK 7 adds production depth for agent…

15
r/LocalLLaMA community 7d ago

Worse quality with MTP - Qwen 3.6, Gemma 4

Hi. I am self-hosting Qwen 3.6 27B Q8_K_XL with Llama.cpp on 4x5070ti. (All 4 cards are on single x16 slot bifurcated to 4x4 with risers). I've been testing it on several work repos with Opencode CLI and in like 8/10 situations the output of non-MTP model is far superior to the…

8
Vercel — AI dev-tools 7d ago

AI SDK 7 is now available

AI SDK 7 is a major release for building production agents in TypeScript. The SDK has grown from model calls and chat primitives into a broader agent platform for developing, running, integrating, and observing agents across text, audio, realtime, image, and video. Every major…

8
arXiv — Machine Learning research 7d ago

Enhancing Clinician Decision-Making via Uncertainty-Aware Multi-Expert Fusion for Stroke Rehabilitation

arXiv:2606.24960v1 Announce Type: new Abstract: Tailoring stroke rehabilitation requires assessing how movements are organized, not merely if they succeed. Currently, this assessment is a rate-limiting bottleneck. Instruments like the Action Research Arm Test (ARAT) compress…

20
arXiv — Machine Learning research 7d ago

Communicability-Inspired Positional Encoding (CIPE)

arXiv:2606.25293v1 Announce Type: new Abstract: Positional encodings (PEs) are essential for Transformers. Yet designing effective PEs for non-Euclidean graphs remains challenging. Such encodings should ideally induce an Attention-Compatible Geometry for self-attention: not…

15
arXiv — Machine Learning research 7d ago

Interpretable Concept-Guided Polynomial Tabular Kolmogorov-Arnold Network for EEG-Based Mild Cognitive Impairment Detection

arXiv:2606.25434v1 Announce Type: new Abstract: Early and scalable detection of mild cognitive impairment (MCI) remains an unresolved clinical challenge. Existing EEG-based screening approaches are constrained by handcrafted feature pipelines that discard neurophysiologically…

10
arXiv — NLP / Computation & Language research 7d ago

Cliff Tokens: Identifying Single-Token Failure Triggers in LLM Mathematical Reasoning

arXiv:2606.25524v1 Announce Type: cross Abstract: Large language models (LLMs) reach high accuracy in mathematical reasoning, but individual traces on the same problem diverge; some arrive at the correct answer while others fail. Prior work analyzes failure at the step, chunk,…

32
arXiv — NLP / Computation & Language research 7d ago

Uncertainty Quantification for Computer-Use Agents: A Benchmark across Vision-Language Models and GUI Grounding Datasets

arXiv:2606.25760v1 Announce Type: cross Abstract: Computer-use agents turn vision-language model (VLM) predictions into executable GUI clicks, so reliable uncertainty estimates are essential for rejection, calibration, miss-severity ranking, and spatial safety regions. Yet…

14
arXiv — NLP / Computation & Language research 7d ago

Paid Voices vs. Public Feeds: Interpretable Cross-Platform Theme-Based Analysis of Climate Discourse

arXiv:2601.13317v2 Announce Type: replace Abstract: Climate discourse online shapes public understanding of climate change and informs political and policy debate, yet it unfolds across structurally different environments: paid advertising platforms host targeted,…

9
Vercel — AI dev-tools 7d ago

Deep Agents and OpenCode are now available in the AI SDK Harness

The AI SDK Harness lets you run established coding-agent runtimes through one unified interface, so you can switch runtimes without changing your application code. Today we're adding two new adapters, Deep Agents and OpenCode, both running inside a Vercel Sandbox. Deep Agents…

27
Vercel — AI dev-tools 7d ago

Vercel Flags no longer requires SDK Keys for Vercel deployments

New projects using Vercel Flags no longer need to configure SDK Keys or the FLAGS environment variable when evaluating flags inside a Vercel deployment. At runtime, the Vercel adapter automatically receives a short-lived OIDC token, so authentication is handled for you with zero…

5
Anthropic SDK (Python) releases dev-tools 7d ago

v0.112.0

0.112.0 (2026-06-24) Full Changelog: v0.111.0...v0.112.0 Features client: add support for system.message streaming events ( 2450d59 ) Bug Fixes memory tool: create parent directories with the correct permissions ( #135 ) ( f2fc2a9 ) Chores api: add support for new refusal…

21
arXiv — Machine Learning research 8d ago

Reconstructing GRACE Terrestrial Water Storage with Spatio-Temporal Graph Neural Networks: An Application to South America

arXiv:2606.23833v1 Announce Type: new Abstract: Terrestrial water storage (TWS) integrates snow, soil moisture, surface water, and groundwater and is a key indicator of how climate variability and human activity reshape the global water cycle. The GRACE and GRACE-FO satellite…

21
arXiv — Machine Learning research 8d ago

Federated Survival Analysis in Healthcare: A Multi-Model Evaluation on Cross-Institutional Heterogeneous Breast Cancer Data

arXiv:2606.23871v1 Announce Type: new Abstract: Survival analysis is central to clinical decision-making, yet reliable time-to-event models require large, diverse cohorts that are rarely available at a single institution, while privacy regulations restrict the centralization of…

28
arXiv — Machine Learning research 8d ago

GRACE: Gated Refinement for Accurate Causal Edge Discovery in High-Dimensional Time Series

arXiv:2606.23880v1 Announce Type: new Abstract: From climate teleconnections to gene regulation, modern time-series datasets encompass tens or hundreds of interacting variables, making causal discovery increasingly challenging. Constraint-based methods offer statistical rigor…

30
arXiv — Machine Learning research 8d ago

KLip-PPO: A per-sample KL perspective on PPO-Clip

arXiv:2606.23932v1 Announce Type: new Abstract: Proximal Policy Optimization (PPO) is the standard policy-gradient algorithm for on-policy reinforcement learning. The literature presents it in two forms, a clipped surrogate that bounds the importance ratio between successive…

8
arXiv — Machine Learning research 8d ago

Cyclic Denoising Reveals Ultrastable Memories in Diffusion Models

arXiv:2606.24000v1 Announce Type: new Abstract: We introduce cyclic denoising -- repeated forward and reverse diffusion at controlled noise amplitudes -- as an extraction attack for image diffusion models. Inspired by random organization in disordered solids, cyclic denoising…

17
arXiv — NLP / Computation & Language research 8d ago

RASC+: Retrieval-Constrained LLM Adjudication for Clinical Value Set Authoring

arXiv:2606.23992v1 Announce Type: new Abstract: Clinical value sets define the standardized terminology codes used in quality measurement, phenotyping, cohort construction, and clinical decision support. The recently introduced Retrieval-Augmented Set Completion (RASC) benchmark…

32
arXiv — NLP / Computation & Language research 8d ago

PORTER: Language-Grounded Event Representations for Portable Structured EHR Foundation Models

arXiv:2606.24102v1 Announce Type: new Abstract: Most electronic health record (EHR) foundation models encode clinical events as discrete event tokens from a fixed vocabulary and therefore cannot directly represent events containing unseen concepts or new combinations of concepts…

35
arXiv — Machine Learning research 8d ago

MedPCFM: Improving Medical Point Cloud Completion by Integrating Point Transformers and Flow Matching

arXiv:2606.24433v1 Announce Type: cross Abstract: Medical point cloud completion is important for anatomical reconstruction and downstream clinical workflows, yet generative modeling in this setting remains insufficiently studied. We investigate completion through…

28
arXiv — NLP / Computation & Language research 8d ago

One Year Later...The Harms Persist, But So Do We!

arXiv:2606.23884v1 Announce Type: new Abstract: General-purpose large language models (LLMs) are increasingly used for mental health-related conversations, yet safety safeguards remain inadequate and inconsistent across clinical conditions. This study evaluates six proprietary…

26
arXiv — NLP / Computation & Language research 8d ago

MedBench v5: A Dynamic, Process-Oriented, and Hallucination-Aware Benchmark for Clinical Multimodal Models

arXiv:2606.24155v1 Announce Type: new Abstract: Existing medical AI benchmarks lack process visibility, atomic skill evaluation, and integrated hallucination detection. We introduce MedBench v5, a redesigned benchmark for clinical multimodal models (language, vision-language,…

38
arXiv — NLP / Computation & Language research 8d ago

MMed-Bench-IR: A Heterogeneous Benchmark for Multilingual Medical Information Retrieval

arXiv:2606.24200v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) in clinical settings increasingly requires multilingual retrieval against predominantly English evidence corpora. Multilingual medical retrieval demands three capabilities: cross-lingual…

36
arXiv — NLP / Computation & Language research 8d ago

A specialized reasoning large language model for accelerating rare disease diagnosis: a randomized AI physician assistance trial

arXiv:2606.24510v1 Announce Type: cross Abstract: Rare diseases affect millions of individuals worldwide, yet timely diagnosis remains a major public health challenge due to scarcity of specialized clinical expertise. While large language models (LLMs) show promise to support…

28
arXiv — NLP / Computation & Language research 8d ago

Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering

arXiv:2403.04890v4 Announce Type: replace Abstract: In this paper, we propose a modified version of the MedQA-USMLE dataset, named MEDQA-OPEN, which contains open-ended medical questions without options to mimic clinical scenarios, along with clinician-approved reasoned answers.…

7
Hacker News — AI on Front Page community 8d ago

Fired by Google for creating the Google workspace CLI

https://xcancel.com/JPoehnelt/status/2069482265953087602 Comments URL: https://news.ycombinator.com/item?id=48649011 Points: 252 # Comments: 173

36
r/LocalLLaMA community 8d ago

650+ Apache-2.0 biomedical NER/de-id models that run on-device in MLX. Same fp32 weights, identical outputs: the clinical NER models run 30-40x faster than PyTorch-CPU on a 3-year-old M3 Max. Repro inside.

Disclosure first: I maintain OpenMed, so read this with that bias. I'm posting the numbers with the full methodology and a runnable script so you can reproduce or tear it apart. I'm here for the next couple of hours to answer methodology questions. What it is: an open-source…

25
r/LocalLLaMA community 8d ago

I benchmarked 8 LLMs for medical scribing. Hallucinations were rare; omissions need attention.

I ran a small benchmark on LLMs for medical scribing. Reason: most discussion around AI scribe safety focuses on hallucinations. That matters, but in notes I kept seeing another problem: models often leave out clinically relevant details from the conversation. So I evaluated 8…

10
MIT Technology Review — AI news-outlet 8d ago

The $400 million machine powering the future of chipmaking

Jos Benschop is climbing a ladder to get to the top of his newest machine.  It’s a bit of a schlep. The contraption is the size of a double-decker bus—more than 150 tons of gleaming precision-milled aluminum covered in thousands of snaking tubes, colored cables, and…

24
Vercel — AI dev-tools 9d ago

Deploy Node servers with zero configuration

You can now deploy a Node.js server to Vercel with zero configuration. Vercel detects a server.ts file at the project root or at src/server.ts and deploys it as a Node.js application, in addition to existing zero-configuration backends like Express, Koa, and NestJS: Vercel CLI…

12
Hugging Face Daily Papers research 9d ago

CLI-Universe: Towards Verifiable Task Synthesis Engine for Terminal Agents

Abstract A principled synthesis engine generates high-quality terminal-agent tasks through multi-dimensional capability taxonomy and evidence-guided research, creating a distilled dataset that enables significant performance gains in LLM training. Generated by…

5
Hugging Face official-blog 9d ago

Shipping huggingface_hub every week with AI, open tools, and a human in the loop

Back to Articles a]:hidden"> Shipping huggingface_hub every week with AI, open tools, and a human in the loop Published June 23, 2026 Update on GitHub Upvote - Lucain Pouget Wauplin Célina Hanouti celinah huggingface_hub is the Python client at the base of the Hugging Face…

18
Vercel — AI dev-tools 9d ago

Redesigned trace viewer for Vercel Workflows

The trace viewer for Vercel Workflows and Workflow SDK has been redesigned to better support inspecting runs from start to finish. Search across spans, zoom into any section of the timeline, and step through with the keyboard to find what you're looking for fast, then click into…

19
Vercel — AI dev-tools 9d ago

Preserve local environment variables when linking with the Vercel CLI

The Vercel CLI now preserves your .env.local file when running vercel link . Previously, linking could overwrite variables already in the file. The CLI now updates VERCEL_OIDC_TOKEN if it exists, or appends it if missing, without touching anything else. Run pnpm i -g…

9
Vercel — AI dev-tools 9d ago

Scaffold your chat apps with create-chat-sdk

Creating a new Chat SDK bot now takes a single command. create-chat-sdk scaffolds a complete Next.js project with your chosen platform adapters, a state adapter, environment variables, and a webhook route. The CLI walks you through selecting your platform adapters (e.g., Slack…

14
Vercel — AI dev-tools 9d ago

Chat SDK adds Kapso support

Chat SDK now supports Kapso with the new vendor-official adapter . Kapso connects your bot to WhatsApp through its hosted platform, handling the WhatsApp Business setup, credentials, and webhooks so you can focus on your bot's logic. Replies use the standard Chat SDK thread and…

31
Vercel — AI dev-tools 9d ago

Chat SDK adds Novu support

Chat SDK now supports Novu with the new vendor-official adapter . One handler set puts your agent on Slack, Microsoft Teams, WhatsApp, Telegram, and email. Novu handles credentials, identity, and delivery, keeping OAuth and tokens outside your app and mapping each channel to one…

32
Vercel — AI dev-tools 9d ago

Chat SDK adds Sendblue support

Chat SDK now supports Sendblue with the new vendor-official adapter . Build bots that send and receive iMessage, SMS, and RCS through Sendblue's hosted gateway, reaching people on the messaging apps they already use. Messages use iMessage-first delivery with support for…

10
Vercel — AI dev-tools 9d ago

Chat SDK adds Linq support

Chat SDK now supports Linq with the new vendor-official adapter . Build bots that send and receive texts in both direct messages and group chats, with bidirectional media and native iMessage tapback reaction support. Replies use the standard Chat SDK thread and message APIs,…

17
Hugging Face Daily Papers research 10d ago

BrainG3N: A Dual-Purpose Tokenizer for Controllable 3D Brain MRI Generation

Abstract A 3D brain MRI generative model uses a masked-autoencoder tokenizer to create clinically informative embeddings that support both medical task performance and controlled image generation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Three-dimensional (3D) brain MRI is…

6
r/LocalLLaMA community 10d ago

Your Favorite Workflow to Convert PDF with Complex Structure to Markdown?

I've tried markitdown, Docling, and Mineru. Are there better tools I should try? I need to process tables, floating box, etc. Thanks!   submitted by   /u/chibop1 [link]   [comments]

30
Vercel — AI dev-tools 10d ago

WebSocket support is now in Public Beta

Vercel Functions can now serve WebSocket connections, enabling bidirectional communication between clients and server-side code on Vercel. Use WebSockets for realtime features such as interactive AI streaming, chat, and collaborative apps. WebSocket connections run Fluid compute…

23
Vercel — AI dev-tools 10d ago

Vercel CLI now supports signing blob URLs

You can now generate signed URLs for Vercel Blob directly from the Vercel CLI. A signed URL is a scoped URL with a set expiration time that lets you perform a single operation on a specific object. Each URL is scoped to one operation ( get , head , put , or delete ), one…

30
Vercel — AI dev-tools 10d ago

Workflow SDK now compresses run and step payloads

The Workflow SDK 5 beta now compresses all run, hook, and step inputs and outputs with zstd . Compression kicks in automatically, but only when it helps. Small payloads stay as-is, larger ones get compressed before they're persisted. Compressed payloads use less storage and are…

16
Simon Willison community 10d ago

sqlite-utils 4.0rc1 adds migrations and nested transactions

sqlite-utils is my combined Python library and CLI tool for working with SQLite databases. It provides an extensive set of higher-level operations on top of Python's default sqlite3 package , including support for complex table transformations , automatic table creation from…

13

Somatic in the East, Psychological in the West?: Investigating Clinically-Grounded Cross-Cultural Depression Symptom Expression in LLMs

GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents

Good YouTube channels for local LLM news and development?

Which model for technical documentation?

Show HN: OpenKnowledge – open source AI-first alternative to Obsidian/Notion

AI SDK 7

Worse quality with MTP - Qwen 3.6, Gemma 4

AI SDK 7 is now available

Enhancing Clinician Decision-Making via Uncertainty-Aware Multi-Expert Fusion for Stroke Rehabilitation

Communicability-Inspired Positional Encoding (CIPE)

Interpretable Concept-Guided Polynomial Tabular Kolmogorov-Arnold Network for EEG-Based Mild Cognitive Impairment Detection

Cliff Tokens: Identifying Single-Token Failure Triggers in LLM Mathematical Reasoning

Uncertainty Quantification for Computer-Use Agents: A Benchmark across Vision-Language Models and GUI Grounding Datasets

Paid Voices vs. Public Feeds: Interpretable Cross-Platform Theme-Based Analysis of Climate Discourse

Deep Agents and OpenCode are now available in the AI SDK Harness

Vercel Flags no longer requires SDK Keys for Vercel deployments

v0.112.0

Reconstructing GRACE Terrestrial Water Storage with Spatio-Temporal Graph Neural Networks: An Application to South America

Federated Survival Analysis in Healthcare: A Multi-Model Evaluation on Cross-Institutional Heterogeneous Breast Cancer Data

GRACE: Gated Refinement for Accurate Causal Edge Discovery in High-Dimensional Time Series

KLip-PPO: A per-sample KL perspective on PPO-Clip

Cyclic Denoising Reveals Ultrastable Memories in Diffusion Models

RASC+: Retrieval-Constrained LLM Adjudication for Clinical Value Set Authoring

PORTER: Language-Grounded Event Representations for Portable Structured EHR Foundation Models

MedPCFM: Improving Medical Point Cloud Completion by Integrating Point Transformers and Flow Matching

One Year Later...The Harms Persist, But So Do We!

MedBench v5: A Dynamic, Process-Oriented, and Hallucination-Aware Benchmark for Clinical Multimodal Models

MMed-Bench-IR: A Heterogeneous Benchmark for Multilingual Medical Information Retrieval

A specialized reasoning large language model for accelerating rare disease diagnosis: a randomized AI physician assistance trial

Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering

Fired by Google for creating the Google workspace CLI

650+ Apache-2.0 biomedical NER/de-id models that run on-device in MLX. Same fp32 weights, identical outputs: the clinical NER models run 30-40x faster than PyTorch-CPU on a 3-year-old M3 Max. Repro inside.

I benchmarked 8 LLMs for medical scribing. Hallucinations were rare; omissions need attention.

The $400 million machine powering the future of chipmaking

Deploy Node servers with zero configuration

CLI-Universe: Towards Verifiable Task Synthesis Engine for Terminal Agents

Shipping huggingface_hub every week with AI, open tools, and a human in the loop

Redesigned trace viewer for Vercel Workflows

Preserve local environment variables when linking with the Vercel CLI

Scaffold your chat apps with create-chat-sdk

Chat SDK adds Kapso support

Chat SDK adds Novu support

Chat SDK adds Sendblue support

Chat SDK adds Linq support

BrainG3N: A Dual-Purpose Tokenizer for Controllable 3D Brain MRI Generation

Your Favorite Workflow to Convert PDF with Complex Structure to Markdown?

WebSocket support is now in Public Beta

Vercel CLI now supports signing blob URLs

Workflow SDK now compresses run and step payloads

sqlite-utils 4.0rc1 adds migrations and nested transactions