Tag

Developer Tool

500 articles archived under #developer-tool · RSS

arXiv — Machine Learning research 2h ago

A Filtered Mixture-of-Generators for Fully Synthetic Survival Training

arXiv:2607.00127v1 Announce Type: new Abstract: Survival analysis models time-to-event data, but in clinical settings training data are costly and scarce: events accrue over years of follow-up, cohorts are small, and privacy regulations restrict sharing across institutions.…

26
arXiv — Machine Learning research 2h ago

Entropy-Regularized Probabilistic Gates for Sparse Model Discovery in Scarce-Data Federated Learning

arXiv:2607.00275v1 Announce Type: new Abstract: Federated Learning (FL) is a distributed machine learning (ML) paradigm with collaboration among multiple clients without sharing data. FL is challenging under data heterogeneity and partial client participation. Learning sparse…

14
arXiv — Machine Learning research 2h ago

LLM-Guided ODE Discovery and Parameter Inference from Small-Cohort Aggregate Data

arXiv:2607.00733v1 Announce Type: new Abstract: Mechanistic modeling via ordinary differential equations (ODEs) provides interpretable descriptions of complex dynamics and enables inference of underlying mechanisms, which is particularly valuable in clinical settings. However,…

36
arXiv — Machine Learning research 2h ago

Automatic Detection of Stress from Speech in the Trier Social Stress Test

arXiv:2607.00986v1 Announce Type: new Abstract: Automatically detecting stress in speech provides an unobtrusive way to gain insights relevant to behavioral research or clinical assessment. This study investigates the automatic differentiation between a stressful and…

12
arXiv — NLP / Computation & Language research 2h ago

Selective Test-Time Debiasing for CLIP via Reward Gating

arXiv:2607.00423v1 Announce Type: new Abstract: Vision language models (VLMs) demonstrate strong zero-shot performance, but often perpetuate social stereotypes in person-centric queries, yielding skewed demographic distributions. Current debiasing methods apply uniform bias…

22
arXiv — NLP / Computation & Language research 2h ago

Dynamic Bidirectional Pattern Memory: A Production-Scale Empirical Characterisation of Inference-Time Gating in Clinical NLP

arXiv:2607.00870v1 Announce Type: new Abstract: We study inference-time pattern-memory gating in a production-scale clinical natural language processing (NLP) pipeline. The pipeline pairs a generator (Llama-3.3 70B) proposing extractions with a verifier (MMed-Llama-3.1 70B)…

33
arXiv — NLP / Computation & Language research 2h ago

Clinician-Level Agreement Without Clinical Caution: LLM Evaluator Limits in Medical AI Benchmarking

arXiv:2607.01103v1 Announce Type: new Abstract: Open-response evaluation provides stronger clinical validity than multiple-choice benchmarks but creates a scoring bottleneck that motivates automated LLM-asa-Judge approaches. Whether such evaluators replicate clinical calibration…

12
Anthropic SDK (Python) releases dev-tools 8h ago

v0.115.1

0.115.1 (2026-07-01) Full Changelog: v0.115.0...v0.115.1 Chores api: remove some nonfunctional types from the SDKs ( 5e7c431 )

27
r/LocalLLaMA community 9h ago

End of an Agony. Real production service that uses LLM to earn money my team had made and now we are so happy that it will die. Here are some of my final "experiences".

Hello everyone. I had posted in this sub about making a production service about 8 months ago. Here the link of my previous post . The idea was the same. We wanted to make a real production service that we can provide to clients to earn money. AI assistant that works through…

30
llama.cpp releases dev-tools 11h ago

b9859

opencl: allow loading precompiled binary kernels from library ( #23042 ) opencl: allow loading binary kernel opencl: add libdl.h ggml-backend-dl is in ggml, which depends backend libs, thus ggml-opencl cannot depend on ggml-backend-dl add libdl.h to break cyclic dep opencl:…

6
arXiv — Machine Learning research 1d ago

Accelerometry-Derived Digital Biomarkers for Cardiometabolic Risk: A Population-Representative Tabular Benchmark with Uncertainty Quantification

arXiv:2606.30702v1 Announce Type: new Abstract: Structured tabular data dominates clinical medicine, yet existing benchmarks fail to reflect real-world properties like complex survey sampling, demographic oversampling, and subgroup fairness. We introduce the NHANES Accelerometry…

31
arXiv — Machine Learning research 1d ago

Mind the Residual Gap: Probabilistic Downscaling under Real-World Bias

arXiv:2606.30821v1 Announce Type: new Abstract: Probabilistic downscaling is the task of modeling the conditional distribution of high-resolution fields given coarse inputs, and is a central challenge to atmospheric science, climate modeling, and other multiscale physical…

21
arXiv — Machine Learning research 1d ago

Teaching LLMs to Recommend and Defer in Underrepresented Epilepsy Care

arXiv:2606.31036v1 Announce Type: new Abstract: Specialist epilepsy expertise is scarce in resource-constrained settings, making LLM-based decision support attractive for frontline clinicians managing longitudinal treatment. Such systems must adapt to local prescribing practice…

12
arXiv — Machine Learning research 1d ago

TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning

arXiv:2606.32017v1 Announce Type: new Abstract: Agentic reinforcement learning requires assigning credit to environment-facing actions such as searches, clicks, edits, navigation commands, and object interactions. Standard GRPO uses the final verifier outcome as a uniform…

13
arXiv — Machine Learning research 1d ago

Explainable Artificial Intelligence For The Detection and Characterisation of Stage B Heart Failure

arXiv:2606.30665v1 Announce Type: cross Abstract: Stage B heart failure is characterized by asymptomatic structural or functional cardiac abnormalities. Identifying individuals at this stage is clinically important, as early detection may enable targeted interventions to prevent…

20
arXiv — NLP / Computation & Language research 1d ago

Gated Multi-Graph Fusion via Graph Attention Networks for Alzheimer's Disease Detection

arXiv:2606.31186v1 Announce Type: new Abstract: Spontaneous speech is a vital non-invasive biomarker for Alzheimer's Disease (AD), yet many systems overlook non-linear structural disruptions and clinical heterogeneity in pathological language. We propose a Multi-View Gated Graph…

31
arXiv — NLP / Computation & Language research 1d ago

Clinically Structured Rank-Gated LoRA for Cross-Benchmark Medical Question Answering

arXiv:2606.31432v1 Announce Type: new Abstract: Medical multiple-choice question answering requires parameter-efficient adaptation across heterogeneous knowledge domains and reasoning operations. A medication question, a diagnostic decision, a public-health item, and a…

33
arXiv — NLP / Computation & Language research 1d ago

CLExEval: A Human-in-the-Loop Framework for Qualitative Evaluation of LLM Clinical Reasoning

arXiv:2606.31608v1 Announce Type: new Abstract: Large Language Models (LLMs) achieve strong results on many medical benchmarks, but their clinical reasoning remains difficult to evaluate reliably. A central risk is an evaluation illusion: fluent and well-structured explanations…

37
Vercel — AI dev-tools 1d ago

Enforce consistent code for agents and humans with konsistent

konsistent is now open source. konsistent is a CLI linter for TypeScript codebases that enforces structural conventions, giving agents and humans the consistent context they need to implement features correctly. Deterministic, fast, and covers structural patterns that TypeScript…

6
Vercel — AI dev-tools 1d ago

Dry-run deployments with Vercel CLI

You can now preview the framework preset and files that Vercel CLI includes in a deployment before creating one. Run vercel deploy --dry from a linked project: For automation or further inspection, return the complete file manifest as JSON: JSON output includes the detected…

17
Hacker News — AI on Front Page community 1d ago

The labor share of income in the US is at its lowest post-war level

Article URL: https://libertystreeteconomics.newyorkfed.org/2026/06/the-post-covid-decline-in-the-labor-share/ Comments URL: https://news.ycombinator.com/item?id=48734234 Points: 290 # Comments: 234

31
arXiv — Machine Learning research 2d ago

NIVA: A Multimodal Foundation Model for Actionable Earth System Intelligence

arXiv:2606.28546v1 Announce Type: new Abstract: Recent advances in AI-driven weather and climate modeling have improved forecast skill while reducing computational cost. However, existing data-driven approaches are limited in their ability to model coupled Earth system dynamics,…

9
arXiv — Machine Learning research 2d ago

GLACIER: Rethinking Mass Spectrum Prediction as an Object Detection Problem

arXiv:2606.29161v1 Announce Type: new Abstract: Predicting tandem mass spectra (MS/MS) from molecular structures represents a central task in analytical chemistry with direct relevance to clinical metabolomics, systems biology, and adjacent disciplines. In this work, we revisit…

13
arXiv — Machine Learning research 2d ago

SP-CACW: Convergence-Aware Client Weighting for Selfish Personalized Learning

arXiv:2606.29322v1 Announce Type: new Abstract: Collaborative learning is sustainable only when it benefits each participant. Standard federated learning optimizes a global average objective, which can under perform for clients whose data distributions differ substantially from…

35
arXiv — NLP / Computation & Language research 2d ago

A French OSCE Dialogue Dataset and Controllable Virtual Patient System for Clinical Training

arXiv:2606.28526v1 Announce Type: new Abstract: The clinical and communication skills of medical students are commonly assessed through Objective Structured Clinical Examinations (OSCEs), which consist of brief scenario-driven simulations of doctor-patient interactions. However,…

36
arXiv — NLP / Computation & Language research 2d ago

The strength of clinical evidence is recoverable from language model representations but not from their stated grades

arXiv:2606.29034v1 Announce Type: new Abstract: Large language models (LLMs) increasingly summarize clinical evidence, where a claim's weight depends on how strongly it is supported. Yet these models convey confidence poorly, and properties they never state, such as truth, are…

17
arXiv — NLP / Computation & Language research 2d ago

TriageRA-CCF: Source-Side Clinical Confidence and Coverage Signals for Adaptive Rank Budgeting in Medical LLMs

arXiv:2606.29375v1 Announce Type: new Abstract: Medical large language models are commonly adapted with a fixed low-rank budget, even though medical questions differ substantially in confidence, clinical coverage, and cross-domain difficulty. We study adaptive rank budgeting for…

15
arXiv — NLP / Computation & Language research 2d ago

How much of an LLM-generated clinical corpus is actually new? A production-scale measurement of content redundancy for provenance classification

arXiv:2606.29605v1 Announce Type: new Abstract: Clinical machine learning increasingly relies on training corpora generated by large language models (LLMs) rather than annotated by clinicians, and such corpora are described and reused largely on the basis of their reported…

12
arXiv — NLP / Computation & Language research 2d ago

Clinical Reasoning Graphs: Structured Evaluation of LLM Diagnostic Reasoning Reveals Competence Without Consistency

arXiv:2606.29876v1 Announce Type: new Abstract: Modern large language models (LLMs) reach 60-70% diagnostic accuracy on complex clinical case benchmarks, but accuracy alone cannot distinguish stable clinically-grounded reasoning from pattern matching. We introduce clinical…

10
r/LocalLLaMA community 2d ago

Krea-2-Turbo Image Model - Easy to be fully uncensored, but it can also EDIT Images!

I've been super impressed with Krea-2-Turbo. It can generate high quality images in ~3 seconds. The quality is quite good compared to other local AI image gen models. Now, I don't want to make you watch or click a you tube video, so I'll just give these clear instructions on how…

5
r/LocalLLaMA community 2d ago

Anyone else end up building a web access layer for local AI agents?

I've been running local models for most of my experiments, and I kept running into the same issue. The model lives locally, but everything it needs to interact with doesn't. Every new agent ended up with another GitHub client, another Reddit integration, another documentation…

10
Vercel — AI dev-tools 2d ago

Query Speed Insights from the Vercel CLI

You can now query Speed Insights datapoints directly through the Vercel CLI. Using the vercel metrics command, you can pull core Web Vitals (LCP, INP, CLS) and other page performance metrics (FCP, TTFB) based on client-side measurements from real user traffic. By providing a…

9
arXiv — Machine Learning research 3d ago

FoggyTrust: Robust Federated Learning with Hierarchical Trust Networks

arXiv:2606.27622v1 Announce Type: new Abstract: Byzantine-robust federated learning seeks to protect distributed model training from malicious or corrupted clients without requiring access to their private data. FLTrust addresses this challenge by introducing a trusted…

33
arXiv — Machine Learning research 3d ago

OperatorSHAP: Fast and Accurate Shapley Value Estimation for Neural Operators

arXiv:2606.28065v1 Announce Type: new Abstract: Understanding model predictions is essential for physical applications, where outputs often inform safety-critical decisions, such as structural load assessment, weather warnings, and clinical diagnosis. Shapley values satisfy many…

20
arXiv — Machine Learning research 3d ago

CPAgents: Agentic Composite Phenotype Generation for Cardiac Disease Association

arXiv:2606.28179v1 Announce Type: new Abstract: Identifying robust associations between cardiac imaging phenotypes and clinical diseases is fundamental to population-scale cardiovascular research and reliable risk stratification. However, current phenome-wide association studies…

13
arXiv — NLP / Computation & Language research 3d ago

From Black-Box to Clinical Insight: A Multi-Stage Explainable Framework for Speech-Based Cognitive Impairment Detection

arXiv:2606.27973v1 Announce Type: new Abstract: Speech-based cognitive impairment detection offers a noninvasive, accessible alternative to costly biomarker assays, yet transformer-based models remain clinically uninterpretable. We propose a multi-stage explainability framework…

23
arXiv — NLP / Computation & Language research 3d ago

The Signal-Coverage Matrix: Stratifying Type and Semantic Errors in Statement Autoformalization

arXiv:2606.28013v1 Announce Type: new Abstract: Headline type-correctness (TC\%) of LLM autoformalization has climbed from $\sim$53\% to $\sim$76\% in two years, yet this scalar conceals which errors each method resolves. We propose a signal-coverage matrix that crosses the Lean…

23
arXiv — NLP / Computation & Language research 3d ago

Aloe-Vision: Robust Vision-Language Models for Healthcare

arXiv:2606.27500v1 Announce Type: cross Abstract: Large Vision-Language Models (LVLMs) specialized in healthcare are emerging as a promising research direction due to their potential impact in clinical and biomedical applications. However, progress is constrained by the scarcity…

28
Vercel — AI dev-tools 3d ago

xAI Grok audio models now available on Vercel AI Gateway

xAI's audio models are now live on AI Gateway. Realtime voice, text to speech, and speech to text are all available through the AI SDK with the same routing, observability, and spend controls as your other models. These capabilities are available on the AI SDK 7 release.…

11
r/MachineLearning community 4d ago

I silently break training codes or configs so I made pybench [P]

It is like pytest but for statistical tests: it ensures no regression of your metrics at a statistical level. It manages tedious things such that seeds, past benchmark results, ... Simple CLI working like pytest but with benchmarks/ directory instead of tests/: pybench # 1st…

38
r/LocalLLaMA community 5d ago

Hello there! (again) i ported my kokoro enhancements so you can use them in your projects.

i made a web based and python based version of the enhancements i made to kokoro's controls. both are, of course, fully client side. if you have hardware acceleration turned on in your browser, kokoro runs on webgpu at about 40ms per generation. it's really fast. note: the…

36
Vercel — AI dev-tools 5d ago

Query Web Analytics from the Vercel CLI

You can now query Web Analytics datapoints directly through the Vercel CLI. Using the vercel metrics command, you can pull page views, visitors, and custom events for your Vercel projects to analyze traffic, compare trends, and answer questions about site performance. By…

14
r/MachineLearning community 5d ago

Made a free tool that automatically cuts the best clips from long videos — thought this community might find it useful [P]

I edit a lot of long-form content and got tired of scrubbing through hour-long recordings to find the good moments. So I built something to do it. You give it a video file (or a YouTube link), it figures out which parts are actually worth watching, and exports short clips in…

22
arXiv — Machine Learning research 6d ago

Beyond Feedforward Networks: Reentry Neural Systems as the Fundamental Basis of Subjecthood and Intrinsic Safety of Next-Generation AGI

arXiv:2606.26406v1 Announce Type: new Abstract: We propose a complete architectural blueprint for safe artificial general intelligence based on a closed reentry loop (D I cycle). In contrast to feedforward networks, which are directed acyclic graphs (C=0, S=0) incapable of…

37
arXiv — NLP / Computation & Language research 6d ago

Context Recycling for Long-Horizon LLM Inference

arXiv:2606.26105v1 Announce Type: new Abstract: Large language models (LLMs) exhibit strong capabilities in short-context reasoning but degrade in performance over long conversational horizons due to context window limitations and inefficient token usage. We introduce…

27
arXiv — Machine Learning research 6d ago

Dot-Flik: A Scalable Edge AI Architecture for Distributed Insect Monitoring

arXiv:2606.26121v1 Announce Type: cross Abstract: Global insect population declines necessitate scalable, continuous monitoring systems, yet existing vision-based solutions remain constrained by high hardware costs, energy demands, and reliance on centralized processing or cloud…

11
arXiv — NLP / Computation & Language research 6d ago

Comparing BERT Sentence-Pair Classification and Few-Shot LLM Prompting for Detecting Threat and Solution Framing in German Climate News

arXiv:2606.26489v1 Announce Type: new Abstract: News media play a central role in shaping public perceptions of climate change, and whether coverage emphasizes threats or solutions has measurable effects on audience engagement and policy support. Automated detection of these…

23
arXiv — NLP / Computation & Language research 6d ago

SamaVaani: Auditing and Debiasing Multilingual Clinical ASR for Indian Languages

arXiv:2606.26901v1 Announce Type: new Abstract: Automatic Speech Recognition (ASR) is increasingly used to document clinical encounters, yet its reliability in multilingual and demographically diverse Indian healthcare context remains largely unknown. In this study, we first…

6
arXiv — NLP / Computation & Language research 6d ago

From Clicks to Intent: Cross-Platform Session Embeddings with LLM-Distilled Taxonomy for Financial Services Recommendations

arXiv:2606.26277v1 Announce Type: cross Abstract: Sequential user behavior modeling is widely adopted in industrial recommender systems; however, significant gaps remain in financial services, where pre-login web interactions and authenticated in-app experiences differ…

24
arXiv — NLP / Computation & Language research 6d ago

Somatic in the East, Psychological in the West?: Investigating Clinically-Grounded Cross-Cultural Depression Symptom Expression in LLMs

arXiv:2508.03247v2 Announce Type: replace Abstract: Prior clinical psychology research shows that Western individuals with depression tend to report psychological symptoms, while Eastern individuals report somatic ones. We test whether Large Language Models (LLMs), which are…

5

A Filtered Mixture-of-Generators for Fully Synthetic Survival Training

Entropy-Regularized Probabilistic Gates for Sparse Model Discovery in Scarce-Data Federated Learning

LLM-Guided ODE Discovery and Parameter Inference from Small-Cohort Aggregate Data

Automatic Detection of Stress from Speech in the Trier Social Stress Test

Selective Test-Time Debiasing for CLIP via Reward Gating

Dynamic Bidirectional Pattern Memory: A Production-Scale Empirical Characterisation of Inference-Time Gating in Clinical NLP

Clinician-Level Agreement Without Clinical Caution: LLM Evaluator Limits in Medical AI Benchmarking

v0.115.1

End of an Agony. Real production service that uses LLM to earn money my team had made and now we are so happy that it will die. Here are some of my final "experiences".

b9859

Accelerometry-Derived Digital Biomarkers for Cardiometabolic Risk: A Population-Representative Tabular Benchmark with Uncertainty Quantification

Mind the Residual Gap: Probabilistic Downscaling under Real-World Bias

Teaching LLMs to Recommend and Defer in Underrepresented Epilepsy Care

TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning

Explainable Artificial Intelligence For The Detection and Characterisation of Stage B Heart Failure

Gated Multi-Graph Fusion via Graph Attention Networks for Alzheimer's Disease Detection

Clinically Structured Rank-Gated LoRA for Cross-Benchmark Medical Question Answering

CLExEval: A Human-in-the-Loop Framework for Qualitative Evaluation of LLM Clinical Reasoning

Enforce consistent code for agents and humans with konsistent

Dry-run deployments with Vercel CLI

The labor share of income in the US is at its lowest post-war level

NIVA: A Multimodal Foundation Model for Actionable Earth System Intelligence

GLACIER: Rethinking Mass Spectrum Prediction as an Object Detection Problem

SP-CACW: Convergence-Aware Client Weighting for Selfish Personalized Learning

A French OSCE Dialogue Dataset and Controllable Virtual Patient System for Clinical Training

The strength of clinical evidence is recoverable from language model representations but not from their stated grades

TriageRA-CCF: Source-Side Clinical Confidence and Coverage Signals for Adaptive Rank Budgeting in Medical LLMs

How much of an LLM-generated clinical corpus is actually new? A production-scale measurement of content redundancy for provenance classification

Clinical Reasoning Graphs: Structured Evaluation of LLM Diagnostic Reasoning Reveals Competence Without Consistency

Krea-2-Turbo Image Model - Easy to be fully uncensored, but it can also EDIT Images!

Anyone else end up building a web access layer for local AI agents?

Query Speed Insights from the Vercel CLI

FoggyTrust: Robust Federated Learning with Hierarchical Trust Networks

OperatorSHAP: Fast and Accurate Shapley Value Estimation for Neural Operators

CPAgents: Agentic Composite Phenotype Generation for Cardiac Disease Association

From Black-Box to Clinical Insight: A Multi-Stage Explainable Framework for Speech-Based Cognitive Impairment Detection

The Signal-Coverage Matrix: Stratifying Type and Semantic Errors in Statement Autoformalization

Aloe-Vision: Robust Vision-Language Models for Healthcare

xAI Grok audio models now available on Vercel AI Gateway

I silently break training codes or configs so I made pybench [P]

Hello there! (again) i ported my kokoro enhancements so you can use them in your projects.

Query Web Analytics from the Vercel CLI

Made a free tool that automatically cuts the best clips from long videos — thought this community might find it useful [P]

Beyond Feedforward Networks: Reentry Neural Systems as the Fundamental Basis of Subjecthood and Intrinsic Safety of Next-Generation AGI

Context Recycling for Long-Horizon LLM Inference

Dot-Flik: A Scalable Edge AI Architecture for Distributed Insect Monitoring

Comparing BERT Sentence-Pair Classification and Few-Shot LLM Prompting for Detecting Threat and Solution Framing in German Climate News

SamaVaani: Auditing and Debiasing Multilingual Clinical ASR for Indian Languages

From Clicks to Intent: Cross-Platform Session Embeddings with LLM-Distilled Taxonomy for Financial Services Recommendations

Somatic in the East, Psychological in the West?: Investigating Clinically-Grounded Cross-Cultural Depression Symptom Expression in LLMs