Tag

Developer Tool

500 articles archived under #developer-tool · RSS

arXiv — NLP / Computation & Language research 1mo ago

The Deterministic Horizon: Impossibility Results as Design Specifications for Trustworthy AI Systems

arXiv:2605.23024v1 Announce Type: cross Abstract: Large language models now write software, draft legal documents, and produce clinical notes, yet fundamental limits, from Turing and Arrow to the No Free Lunch theorems, shape what computation can do. This thesis turns such…

22
arXiv — NLP / Computation & Language research 1mo ago

What Does the Server See? Understanding Privacy Leakage from Large Language Models in Split Inference

arXiv:2605.23158v1 Announce Type: cross Abstract: The deployment of large language models (LLMs) on resource-constrained devices remains challenging, spurring interest in split inference, where models are partitioned between client and server to reduce computational burden and…

6
r/LocalLLaMA community 1mo ago

OCR, granite-docling-258m vs granite-docling-2stage-258m: has anyone actually noticed any improvements?

IBM's granite-docling-2stage-258m granite-docling-2stage-258m Granite Docling 2stage builds upon the Granite Docling, but introduces a key modifications: it builds a dynamic prompt that precomputes layout objects found within a page, making it more robust on out of distribution…

19
r/LocalLLaMA community 1mo ago

Have we passed the peak of inflated expectations?

I noticed the number of people in this sub going down a bit and checked out some google trends. Any idea what's causing this sharp decline?   submitted by   /u/fairydreaming [link]   [comments]

18
r/MachineLearning community 1mo ago

Custom image encoder [P]

Hello, I would like to know whether building my own image encoder would be a good idea instead of using models like CLIP, SigLIP/SigLIP2, or DINO. My use case is video frame classification. My pipeline is the following: the client sends me a video stream, sampled at 1 frame per…

5
arXiv — Machine Learning research 1mo ago

HealthCraft: A Reinforcement Learning Safety Environment for Emergency Medicine

arXiv:2605.21496v1 Announce Type: new Abstract: Frontier language models are being deployed into clinical workflows faster than the infrastructure to evaluate them safely. Static medical-QA benchmarks miss the failure modes that matter in emergency medicine: trajectory-level…

4
arXiv — Machine Learning research 1mo ago

Calibration, Uncertainty Communication, and Deployment Readiness in CKD Risk Prediction: A Framework Evaluation Study

arXiv:2605.21566v1 Announce Type: new Abstract: Machine learning models for chronic kidney disease (CKD) risk prediction often post strong discrimination scores on internal test sets. Calibration and uncertainty quantification get far less attention, leaving clinicians without…

9
arXiv — Machine Learning research 1mo ago

ChronoMedicalWorld: A Medical World Model for Learning Patient Trajectories from Longitudinal Care Data

arXiv:2605.21963v1 Announce Type: new Abstract: Long-horizon clinical simulation -- predicting how a patient's physiology evolves over years under specified interventions -- is central to chronic-disease care, yet existing electronic health record (EHR) models are predominantly…

19
arXiv — Machine Learning research 1mo ago

Beyond Euclidean Proximity: Repairing Latent World Models with Horizon-Matched Trajectory Reachability Metrics

arXiv:2605.22164v1 Announce Type: new Abstract: Latent world models can contain the state needed for control, yet their terminal-cost interface can expose the planner to the wrong decision-relevant information. In common latent MPC, candidate sequences are ranked by Euclidean…

20
arXiv — Machine Learning research 1mo ago

Decomposing Ensemble Spread in Lorenz '96 With Learned Stochastic Parameterizations

arXiv:2605.22242v1 Announce Type: new Abstract: Weather and climate forecasts are inherently uncertain due to chaotic dynamics, imperfect initial conditions, and incomplete representation of the underlying physical processes. Operational ensemble forecasts aim to represent these…

30
arXiv — Machine Learning research 1mo ago

Explainable AI for Data-Driven Design of High-Dimensional Predictive Studies

arXiv:2605.22243v1 Announce Type: new Abstract: Predictive modelling is important for health data analysis and data-driven clinical decision-making. However, predictive studies are challenging to design optimally by hand when tens or even hundreds of features require selection,…

19
arXiv — Machine Learning research 1mo ago

No Epoch Like the Present: Robust Climate Emulation Requires Out-of-Distribution Generalisation

arXiv:2605.22248v1 Announce Type: new Abstract: Climate emulation is an out-of-distribution (OOD) projection task. This is precisely the challenge where modern Machine Learning (ML) methods are most prone to failure. Consequently, while current ML emulators trained on present…

38
arXiv — Machine Learning research 1mo ago

Detecting Atypical Clients in Federated Learning via Representation-Level Divergence

arXiv:2605.22266v1 Announce Type: new Abstract: Federated learning enables collaborative training across distributed clients with heterogeneous data, but such heterogeneity often leads to unstable updates and degraded global performance. Moreover, in practical deployments,…

29
arXiv — NLP / Computation & Language research 1mo ago

When Cases Get Rare: A Retrieval Benchmark for Off-Guideline Clinical Question Answering

arXiv:2605.21807v1 Announce Type: new Abstract: Across medical specialties, clinical practice is anchored in evidence-based guidelines that codify best studied diagnostic and treatment pathways. These pathways routinely fall short for the long tail of real-world care not covered…

34
arXiv — NLP / Computation & Language research 1mo ago

ChronoMedKG: A Temporally-Grounded Biomedical Knowledge Graph and Benchmark for Clinical Reasoning

arXiv:2605.22734v1 Announce Type: new Abstract: Biomedical knowledge graphs (KGs) treat disease associations as static facts, but temporal information is crucial for clinical reasoning, e.g., a symptom diagnostic of one disease at age 3 may imply a different disease at age 13.…

32
arXiv — NLP / Computation & Language research 1mo ago

The Double Dilemma in Multi-Task Radiology Report Generation: A Gradient Dynamics Analysis and Solution

arXiv:2605.22635v1 Announce Type: cross Abstract: While multi-task learning based automatic radiology report generation (RRG) is widely adopted to ensure clinical consistency, most focus on architectural designs yet remain limited to coarse linear scalarization strategies. These…

12
Hugging Face Daily Papers research 1mo ago

Training Large Language Models to Predict Clinical Events

Abstract Longitudinal clinical notes are converted into temporal prediction examples using Foresight Learning, enabling improved clinical prediction through LoRA adaptation that enhances calibration and reduces uncertainty compared to base models. AI-generated summary…

34
r/LocalLLaMA community 1mo ago

Gmail tie-ins

hey folks. I’m looking to setup a way to give a local LLM access to google cloud SDK for Gmail functions. The goal is to be able to have an LLM once daily check a spreadsheet, and based on criteria send an email that will be structured exactly the same way each time, simply as a…

14
llama.cpp releases dev-tools 1mo ago

b9276

server: expose prompt token counts in /slots endpoint ( #23454 ) Add n_prompt_tokens, n_prompt_tokens_processed, and n_prompt_tokens_cache to the /slots JSON response. These fields are already tracked internally but were not exposed, making it impossible for clients to monitor…

15
The Information — AI news-outlet 1mo ago

Workday Stock Jumps 10% After Company Reveals AI Agent Gains

Workday shares climbed more than 10% in after-hours trading on Thursday after the HR application maker said the number of customers using its AI agents in the three months ended April 30 roughly doubled from the previous quarter to more than 4,000. Gerrit Kazmaier, the company’s…

38
OpenAI Python SDK releases dev-tools 1mo ago

v2.38.0

2.38.0 (2026-05-21) Full Changelog: v2.37.0...v2.38.0 Features api: api update ( 33d1d01 ) api: manual updates ( a21700a ) api: update OpenAPI spec or Stainless config ( 00265c5 ) Chores api: docs updates ( ee10152 ) check release PR custom code sync ( 2638779 ) remove release…

26
r/LocalLLaMA community 1mo ago

Qwen3.6 35Ba3 has changed my workflows and even how I use my computer

My workflow has changed basically to ask Codex to do certain tasks and then document how to do them (including errors it found on its way) into a skill. I feed that skill to pi, and suddenly my qwen3.6 gets that hard stuff done: - devops on a VPS - using docling to create epubs…

33
Google DeepMind official-blog 1mo ago

We’re launching the Google DeepMind Accelerator program in Asia Pacific to tackle environmental risks

The Asia-Pacific region is a global engine for economic growth, but it's also highly vulnerable to climate change. While green technologies are gaining momentum, a recent report shows they aren’t scaling fast enough to keep up with the region’s rising environmental risks. To…

22
r/LocalLLaMA community 1mo ago

LlamaStation v0.9 — llama.cpp GUI for Windows with multi-backend support, TurboQuant, MTP and more

I've been building this for the past few months as a side project — started because I didn't want to run llama.cpp from the command line every time I wanted to try a model. I just wanted something that worked with a click. Fair warning: I'm not a developer. This is 100% vibe…

33
TechCrunch — AI news-outlet 1mo ago

With aluminum prices up 20%, recycling startups bet on AI to cash in

Recycling startups are using AI to improve the recovery of critical minerals like aluminum, aiming to build a massive source of the metal.

16
Vercel — AI dev-tools 1mo ago

Pull anomaly alert details using the Vercel CLI

You can now access anomaly alerts and their details directly through the Vercel CLI . With the vercel alerts command, you can list all alerts for a team or given project. For each alert, you can view the start time, the type of alert, and whether or not the alert is still…

10
arXiv — Machine Learning research 1mo ago

GraphDiffMed: Knowledge-Constrained Differential Attention with Pharmacological Graph Priors for Medication Recommendation

arXiv:2605.20188v1 Announce Type: new Abstract: Recommending safe and effective medication combinations from electronic health records (EHRs) is a core clinical AI problem, yet it remains difficult because patient trajectories are long, noisy, and clinically heterogeneous.…

9
arXiv — Machine Learning research 1mo ago

Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity

arXiv:2605.20269v1 Announce Type: new Abstract: Many bandit deployments (recommendation, clinical dosing, ad targeting) share two facts prior work handles only in isolation: rewards live on a low-dimensional latent subspace, and that subspace drifts. Stationary low-rank bandits…

14
arXiv — Machine Learning research 1mo ago

TreeText-CTS: Compact, Source-Traceable Tree-Path Evidence for Irregular Clinical Time-Series Prediction

arXiv:2605.20292v1 Announce Type: new Abstract: Numerical time-series models can effectively process irregular electronic health record (EHR) trajectories, but they do not naturally expose the measurements and temporal patterns supporting each risk estimate as readable evidence.…

21
arXiv — Machine Learning research 1mo ago

Causal Unlearning in Collaborative Optimization: Exact and Approximate Influence Reversal under Adversarial Contributions

arXiv:2605.20341v1 Announce Type: new Abstract: Federated learning systems must support data deletion requests to comply with privacy regulations, yet retraining from scratch after each deletion is computationally prohibitive. We present HF-KCU, a method that removes a client's…

17
arXiv — Machine Learning research 1mo ago

SMA-DP: Spectral Memory-Aware Differential Privacy for Deep Learning

arXiv:2605.20450v1 Announce Type: new Abstract: Differentially private stochastic gradient descent (DP-SGD) enables private deep learning through per-example clipping and calibrated Gaussian noise, but its high-variance updates can reduce utility on challenging datasets. We…

4
arXiv — Machine Learning research 1mo ago

CASCADE Conformal Prediction: Uncertainty-Adaptive Prediction Intervals for Two-Stage Clinical Decision Support

arXiv:2605.20468v1 Announce Type: new Abstract: Effective medication management in Parkinson's Disease (PD) is challenging due to heterogeneous disease progression, variable patient response, and medication side effects. While AI models can forecast levodopa equivalent daily…

8
arXiv — Machine Learning research 1mo ago

Deep Learning Surrogates for Emulating Stochastic Climate Tipping Dynamics

arXiv:2605.20580v1 Announce Type: new Abstract: This work explores a dynamics-informed Temporal Fusion Transformer (TFT) as a data-driven surrogate for computationally intensive Earth system simulations. Focusing on multivariate time series describing global ocean transport, we…

38
arXiv — Machine Learning research 1mo ago

AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback

arXiv:2605.20722v1 Announce Type: new Abstract: Reinforcement learning improves LLM reasoning, but PPO/GRPO typically use fixed clipping and decoding temperature, which makes training brittle and tuning-heavy. We propose Adaptive Group Policy Optimization (AGPO), a critic-free…

23
arXiv — Machine Learning research 1mo ago

PACD-Net: Pseudo-Augmented Contrastive Distillation for Glycemic Control Estimation from SMBG

arXiv:2605.20751v1 Announce Type: new Abstract: Effective diabetes management requires continuous monitoring of glycemic levels. Clinically, glycemic control is assessed using metrics such as Time in Range (TIR), Time Below Range (TBR), and Time Above Range (TAR), typically…

35
arXiv — Machine Learning research 1mo ago

Causal Machine Learning Is Not a Panacea: A Roadmap for Observational Causal Inference in Health

arXiv:2605.20782v1 Announce Type: new Abstract: Objective: The growing availability of large-scale observational clinical datasets and challenges in conducting randomized controlled trials have spurred enthusiasm in using causal machine learning (ML) for causal inference in…

15
arXiv — NLP / Computation & Language research 1mo ago

Do No Harm? Hallucination and Actor-Level Abuse in Web-Deployed Medical Large Language Models

arXiv:2605.20591v1 Announce Type: new Abstract: Medical large language models (LLMs), including custom medical GPTs (MedGPTs) and open-source models, are increasingly deployed on web platforms to provide clinical guidance. However, they pose risks of hallucination, policy…

33
arXiv — NLP / Computation & Language research 1mo ago

Assessing socio-economic climate impacts from text data

arXiv:2605.20793v1 Announce Type: new Abstract: Recent advances in natural language processing (NLP) and large language models (LLMs) have enabled the systematic use of large-scale textual data from news, social media, and reports to create datasets with socio-economic impacts…

15
arXiv — NLP / Computation & Language research 1mo ago

Automated ICD Classification of Psychiatric Diagnoses: From Classical NLP to Large Language Models

arXiv:2605.21154v1 Announce Type: new Abstract: Mental health has become a global priority, leading to a massive administrative burden in the coding of clinical diagnoses. This study proposes the automation of psychiatric diagnostic analysis by mapping free-text descriptions to…

8
arXiv — NLP / Computation & Language research 1mo ago

Reliable Automated Triage in Spanish Clinical Notes: A Hybrid Framework for Risk-Aware HIV Suspicion Identification

arXiv:2605.21256v1 Announce Type: new Abstract: Standard clinical Natural Language Processing (NLP) benchmarks often yield inflated metrics by forcing deterministic classification on ambiguous instances, thereby obscuring the clinical risks of overconfident predictions. To…

34
arXiv — NLP / Computation & Language research 1mo ago

SymbolicLight V1: Spike-Gated Dual-Path Language Modeling with High Activation Sparsity and Sub-Billion-Scale Pre-Training Evidence

arXiv:2605.21333v1 Announce Type: new Abstract: Natively trained spiking language models struggle to combine Transformer-like language quality, stable multi-domain pre-training, and high activation sparsity. We present SymbolicLight V1, a spike-gated dual-path language model…

7
arXiv — NLP / Computation & Language research 1mo ago

NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding

arXiv:2605.20525v1 Announce Type: cross Abstract: We present NeuroQA, a large-scale benchmark for visual question answering in 3D brain magnetic resonance imaging (MRI), with 56,953 QA pairs from 12,977 subjects across 12 datasets. It spans ages 5-104 and five clinical domains:…

8
Hugging Face Daily Papers research 1mo ago

Rethinking Visual Attribution for Chest X-ray Reasoning in Large Vision Language Models

Abstract A causal evaluation framework is developed to verify visual evidence grounding in chest X-ray vision-language models, leading to the proposal of MedFocus, a concept-based attribution method that improves clinical trustworthiness through anatomical region localization…

6
Vercel — AI dev-tools 1mo ago

Configure weighted traffic splits for Vercel Flags from the Vercel CLI

You can now configure weighted traffic splits for Vercel Flags with the new vercel flags split command in the Vercel CLI. This allows you to send a percentage of traffic to one variant and the rest to another. Run the command interactively, or pass the environment, bucketing…

23
TechCrunch — AI news-outlet 1mo ago

Clouted wants to take the guesswork out of making short videos go viral

The video clipping startup raised a $7 million seed round led by Slow Ventures.

33
The Information — AI news-outlet 1mo ago

Intuit Lays Off 17% of Staff as Revenue Growth Sinks to Lowest Level Since 2024

Intuit shares dropped around 14% in extended trading on Wednesday as the maker of QuickBooks and TurboTax reported revenue growth dipping to its slowest pace since 2024. Revenue in the fiscal third quarter, which ended April 30, climbed 10% from the previous year to $8.6…

19
LangChain releases dev-tools 1mo ago

langchain-fireworks==1.4.0

Changes since langchain-fireworks==1.3.1 release(fireworks): 1.4.0 ( #37582 ) feat(fireworks): migrate to fireworks-ai 1.x SDK ( #37581 ) chore(model-profiles): refresh model profile data ( #37574 ) chore: bump idna from 3.10 to 3.15 in /libs/partners/fireworks ( #37527 )…

26
Hugging Face Daily Papers research 1mo ago

ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning

Abstract ClinSeekAgent is an automated agentic framework that enables large language models to actively acquire and synthesize multimodal clinical evidence from raw data sources, improving decision-making accuracy in both text-only and multimodal tasks. AI-generated summary…

38
r/LocalLLaMA community 1mo ago

OCR, granite-docling-258m vs granite-docling-2stage-258m: has anyone actually noticed any improvements?

IBM's granite-docling-2stage-258m granite-docling-2stage-258m Granite Docling 2stage builds upon the Granite Docling, but introduces a key modifications: it builds a dynamic prompt that precomputes layout objects found within a page, making it more robust on out of distribution…

30
r/LocalLLaMA community 1mo ago

How accurate can “whichllm” be?

Hello people I think the question is clear but I wanted to add some context: I work on internal tools in my job and some of the tools are for us developers (most tools are for marketing and factory production). I am currently working on a small cli tool that uses a local model…

12

The Deterministic Horizon: Impossibility Results as Design Specifications for Trustworthy AI Systems

What Does the Server See? Understanding Privacy Leakage from Large Language Models in Split Inference

OCR, granite-docling-258m vs granite-docling-2stage-258m: has anyone actually noticed any improvements?

Have we passed the peak of inflated expectations?

Custom image encoder [P]

HealthCraft: A Reinforcement Learning Safety Environment for Emergency Medicine

Calibration, Uncertainty Communication, and Deployment Readiness in CKD Risk Prediction: A Framework Evaluation Study

ChronoMedicalWorld: A Medical World Model for Learning Patient Trajectories from Longitudinal Care Data

Beyond Euclidean Proximity: Repairing Latent World Models with Horizon-Matched Trajectory Reachability Metrics

Decomposing Ensemble Spread in Lorenz '96 With Learned Stochastic Parameterizations

Explainable AI for Data-Driven Design of High-Dimensional Predictive Studies

No Epoch Like the Present: Robust Climate Emulation Requires Out-of-Distribution Generalisation

Detecting Atypical Clients in Federated Learning via Representation-Level Divergence

When Cases Get Rare: A Retrieval Benchmark for Off-Guideline Clinical Question Answering

ChronoMedKG: A Temporally-Grounded Biomedical Knowledge Graph and Benchmark for Clinical Reasoning

The Double Dilemma in Multi-Task Radiology Report Generation: A Gradient Dynamics Analysis and Solution

Training Large Language Models to Predict Clinical Events

Gmail tie-ins

b9276

Workday Stock Jumps 10% After Company Reveals AI Agent Gains

v2.38.0

Qwen3.6 35Ba3 has changed my workflows and even how I use my computer

We’re launching the Google DeepMind Accelerator program in Asia Pacific to tackle environmental risks

LlamaStation v0.9 — llama.cpp GUI for Windows with multi-backend support, TurboQuant, MTP and more

With aluminum prices up 20%, recycling startups bet on AI to cash in

Pull anomaly alert details using the Vercel CLI

GraphDiffMed: Knowledge-Constrained Differential Attention with Pharmacological Graph Priors for Medication Recommendation

Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity

TreeText-CTS: Compact, Source-Traceable Tree-Path Evidence for Irregular Clinical Time-Series Prediction

Causal Unlearning in Collaborative Optimization: Exact and Approximate Influence Reversal under Adversarial Contributions

SMA-DP: Spectral Memory-Aware Differential Privacy for Deep Learning

CASCADE Conformal Prediction: Uncertainty-Adaptive Prediction Intervals for Two-Stage Clinical Decision Support

Deep Learning Surrogates for Emulating Stochastic Climate Tipping Dynamics

AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback

PACD-Net: Pseudo-Augmented Contrastive Distillation for Glycemic Control Estimation from SMBG

Causal Machine Learning Is Not a Panacea: A Roadmap for Observational Causal Inference in Health

Do No Harm? Hallucination and Actor-Level Abuse in Web-Deployed Medical Large Language Models

Assessing socio-economic climate impacts from text data

Automated ICD Classification of Psychiatric Diagnoses: From Classical NLP to Large Language Models

Reliable Automated Triage in Spanish Clinical Notes: A Hybrid Framework for Risk-Aware HIV Suspicion Identification

SymbolicLight V1: Spike-Gated Dual-Path Language Modeling with High Activation Sparsity and Sub-Billion-Scale Pre-Training Evidence

NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding

Rethinking Visual Attribution for Chest X-ray Reasoning in Large Vision Language Models

Configure weighted traffic splits for Vercel Flags from the Vercel CLI

Clouted wants to take the guesswork out of making short videos go viral

Intuit Lays Off 17% of Staff as Revenue Growth Sinks to Lowest Level Since 2024

langchain-fireworks==1.4.0

ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning

OCR, granite-docling-258m vs granite-docling-2stage-258m: has anyone actually noticed any improvements?

How accurate can “whichllm” be?