Tag

Developer Tool

500 articles archived under #developer-tool · RSS

r/LocalLLaMA community 10d ago

I pretrained and post trained a 500M parameter LLM and 330M parameter Image generator from scratch

Hey folks Hope you are doing well I started HobbyLM as an side project last month Initially I wrote an Agent harness using Claude SDK which takes notes on various LLM architecture does ablation studies to find optimised or well fit architecture for this model training then I…

16
r/MachineLearning community 11d ago

Python packages for particle swarms, genetic algorithms. Scikit-opt maybe? [D]

I'm working with a client on a curve-fitting optimization problem. They are currently using a constrained Levenburg-Marquardt optimizer for their task which is complex, slow, and sometimes gets stuck in local minima. I suggested using particle swarm optimization (PSO), and the…

17
r/LocalLLaMA community 11d ago

Qwen code companion on vscode marketplace - thoughts

I just came across this extension in vscode few days ago and tried to use with LM studio hosted models and it really is pretty good compared to `continue`, `kilo`, `cline`, `roo` like I felt without much tweaks, gets straight to the point, if any tweaks required u could do…

36
r/LocalLLaMA community 11d ago

Gemma 4 26b a4b is genuinely the best model I have tried for language learning and scientific queries!

I know gemma 4 26b is (according to this sub) a bit behind for coding tasks but for language learning and scientific (health/biology/medical/clinical/biochem) queries it’s unbeaten even by Qwen 3.5/3.6. Since the competition in the small MOE models is generally between Qwen…

28
Simon Willison community 12d ago

Quoting Sean Lynch

The real valuable capability MCP offers over skills/CLI is isolating the auth flow outside of the agent’s context window, and potentially out of the harness completely. [...] Maybe the idealized form of MCP is just an auth gateway for the API and nothing else. That’d still be a…

8
llama.cpp releases dev-tools 12d ago

b9730

mtmd, arg: fix utf8 handling on windows ( #24779 ) mtmd, arg: fix utf8 handling on windows also fix ggml_fopen fix build fail also fix CLI macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux:…

36
Hugging Face Daily Papers research 12d ago

Configurable Clinical Information Extraction with Agentic RAG: What Works, What Breaks, and Why

Abstract ACIE, an agentic RAG system deployed in a clinical setting, demonstrates high accuracy in extracting medical information from complex patient contexts, achieving 96.5% acceptance rate by nuclear-medicine physicians across 7,326 judgments. Generated by…

5
arXiv — Machine Learning research 13d ago

cAPM: Continual AI-Assisted Pace-Mapping with Active Learning

arXiv:2606.19373v1 Announce Type: new Abstract: Ventricular tachycardia is a life-threatening rhythm disorder and a major cause of sudden cardiac death. Pace-mapping is a clinical procedure for identifying the intervention target during catheter ablation of VT. It requires…

15
arXiv — Machine Learning research 13d ago

Insulin4RL: Real-Time Insulin Management in the Intensive Care Unit for Offline Reinforcement Learning

arXiv:2606.19481v1 Announce Type: new Abstract: Offline reinforcement learning (ORL) offers the potential to improve the quality of clinical decision-making using historical electronic health record (EHR) data. Current training and evaluative practices in this field rely heavily…

10
arXiv — Machine Learning research 13d ago

Federated Bilevel Performative Prediction

arXiv:2606.19734v1 Announce Type: new Abstract: Federated bilevel optimization is widely used for nested learning problems across distributed clients, such as federated hyperparameter tuning and meta-learning under privacy and communication constraints. Most existing…

7
arXiv — Machine Learning research 13d ago

When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning

arXiv:2606.19827v1 Announce Type: new Abstract: Medical tabular data are ubiquitous in clinical research, but deep learning for tables remains underexplored because reliable labels often require costly expert adjudication, even though structured clinical variables are routinely…

21
arXiv — Machine Learning research 13d ago

Exploring the potential of AlphaEarth and TESSERA embeddings for Fine-scale Local Climate Zone Mapping: A case study across five cities in Switzerland

arXiv:2606.20034v1 Announce Type: new Abstract: Understanding urban spatial morphology is critical for climate modeling, risk assessment, and sustainable urban design, and Local Climate Zone (LCZ) mapping provides the basic framework for this. However, many cities still use…

10
arXiv — Machine Learning research 13d ago

Constrained hybrid modelling to predict microbial dynamics and organic matter turnover in soil systems

arXiv:2606.20329v1 Announce Type: new Abstract: Soil microorganisms control organic matter cycling and largely determine how soil systems can cope with and mitigate climate change and environmental threats. Representing microbial dynamics in process-based soil models is…

17
arXiv — NLP / Computation & Language research 13d ago

Before the Labels: How Dataset Construction Shapes Suicidality Detection in Clinical Text

arXiv:2606.19637v1 Announce Type: new Abstract: Clinical NLP increasingly relies on electronic health record (EHR) data to detect suicidal behaviors, treating clinical documentation as more reliable ground truth than social media. We argue that this framing obscures how…

36
arXiv — NLP / Computation & Language research 13d ago

Prompt, Plan, Extract: Zero-Shot Agentic LLMs Workflows for Lung Pathology Extraction from Clinical Narratives

arXiv:2606.19852v1 Announce Type: new Abstract: Information extraction from pathology reports is essential for cancer staging, tumor registry population. Yet key data remains embedded in narrative reports, making manual extraction labor-intensive and error-prone. Traditional…

26
arXiv — NLP / Computation & Language research 13d ago

Source-Grounded Data Generation for Text-to-JSON Learning

arXiv:2606.20072v1 Announce Type: new Abstract: From financial filings to clinical records, legacy industries rely heavily on long, unstructured documents to store high-value information. Reliably extracting this information into structured, machine-readable representations is a…

4
arXiv — NLP / Computation & Language research 13d ago

MedRLM: Recursive Multimodal Health Intelligence for Long-Context Clinical Reasoning, Sensor-Guided Screening, Evidence-Grounded Decision Support, and Community-to-Tertiary Referral Optimization

arXiv:2606.20164v1 Announce Type: new Abstract: Real-world clinical decision support requires reasoning over heterogeneous and longitudinal patient information rather than answering isolated medical questions. However, current medical large language models and…

29
arXiv — NLP / Computation & Language research 13d ago

Beyond the GUI Paradigm: Do Mobile Agents Need the Phone Screen?

arXiv:2606.19388v1 Announce Type: cross Abstract: Recent advances in mobile agents are dominated by the GUI paradigm, in which agents perceive UI information and emit screen interactions. However, mobile platforms also expose a command-line interface (CLI) that provides direct…

31
arXiv — NLP / Computation & Language research 13d ago

AgentFinVQA: A Deployable Multi-Agent Pipeline for Auditable Financial Chart QA

arXiv:2606.19782v1 Announce Type: cross Abstract: Financial chart question answering in regulated settings demands more than accuracy: practitioners must know which answers to trust before acting on them, and many institutions cannot send client data to external model providers.…

10
llama.cpp releases dev-tools 13d ago

b9713

mtmd: add batching for mtmd-cli, add video tests ( #24778 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

22
llama.cpp releases dev-tools 13d ago

b9701

mtmd: refactor preprocessor, add mtmd_image_preproc_out ( #24736 ) add mtmd_image_preproc_out add dev docs remove unused clip API rm unused clip_image_f32_batch::grid change preprocess() call signature macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…

15
arXiv — Machine Learning research 14d ago

ThousandWorlds: A benchmark for climate emulation of potentially habitable exoplanets

arXiv:2606.18338v1 Announce Type: new Abstract: The search for life beyond Earth will depend on detecting faint signatures in the atmospheres of potentially habitable exoplanets. Interpreting those signatures requires understanding the host planet's climate: the same molecule…

23
arXiv — Machine Learning research 14d ago

SCOPE-FL: A Strategy-proof Chain-based Optimal pareto efficient Federated Learning System

arXiv:2606.18384v1 Announce Type: new Abstract: Hierarchical Federated Learning (HFL) enables scalable collaborative model training across distributed devices while preserving data privacy. However, existing HFL client selection mechanisms suffer from a fundamental strategic…

31
arXiv — Machine Learning research 14d ago

A Cross-Model VLM-Judge Protocol for Single-Image 3D Mesh Quality (and Why Cheap Proxies Fall Short)

arXiv:2606.18451v1 Announce Type: new Abstract: Single-image-to-3D generators are improving quickly, but there is no agreed, human-free way to tell whether one generated mesh is better than another. Practitioners commonly rely on cheap automatic proxies (render-space CLIP…

32
arXiv — Machine Learning research 14d ago

Beyond AHI: An Interpretable Causal-Discovery-Guided Framework for Sleep Recovery in Connected Health

arXiv:2606.18506v1 Announce Type: new Abstract: Objective sleep assessment relies on polysomnography (PSG), yet clinical impact is often better reflected in patient-reported outcomes (PROs) such as sleepiness and fatigue. Existing summary indices, including the Apnea-Hypopnea…

34
arXiv — Machine Learning research 14d ago

PSyGenTAB: A Privacy-Preserving Framework for Synthetic Clinical Tabular Data Generation via Constrained Optimization

arXiv:2606.18518v1 Announce Type: new Abstract: The development of medical AI is constrained by limited access to high-quality clinical data due to institutional silos and strict privacy regulations such as HIPAA and GDPR. Synthetic data generation offers a potential solution,…

4
arXiv — NLP / Computation & Language research 14d ago

Fair Cognitive Impairment Detection Through Unlearning

arXiv:2606.18571v1 Announce Type: cross Abstract: Mild Cognitive Impairment (MCI) is a medical condition characterized by a noticeable decline in memory, language, or thinking abilities. MCI detection from spontaneous speech is promising for scalable screening. However, learned…

33
arXiv — Machine Learning research 14d ago

ChronoSurv: A Clinical Pathway-Guided Graph Framework for Multimodal Survival Analysis

arXiv:2606.19140v1 Announce Type: new Abstract: Accurate survival prediction is essential for personalized treatment planning in head and neck cancer, yet remains challenging due to the heterogeneous and high-dimensional nature of multimodal clinical data. While deep survival…

32
arXiv — NLP / Computation & Language research 14d ago

Possible or Definite? A Benchmark for Evaluating Diagnostic Uncertainty Preservation in Clinical Text

arXiv:2606.18471v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for clinical text tasks such as summarization and revision. While most studies evaluate the fluency and coherence of LLM-generated text, whether LLMs correctly preserve diagnostic…

11
arXiv — NLP / Computation & Language research 14d ago

Are LLMs Ready to Assist Physicians? PhysAssistBench for Interactive Doctor-Patient-EHR Assistance

arXiv:2606.18613v1 Announce Type: new Abstract: The most plausible near-term role of medical LLMs is to assist rather than replace physicians, yet current evaluations often test isolated capabilities: clinical knowledge, EHR system interaction, or patient communication.…

7
arXiv — NLP / Computation & Language research 14d ago

Beyond Scalar Scores: Exploring LLM-based Metrics for Clinical Significance Evaluation in Radiology Reports

arXiv:2606.18797v1 Announce Type: new Abstract: Reliable evaluation of generated radiology reports requires strict clinical accuracy, as omitted critical findings or mischaracterized radiographic observations can directly affect patient care. Existing metrics obscure this…

23
arXiv — NLP / Computation & Language research 14d ago

Language Models as Interfaces, Not Oracles: A Hybrid LLM-ML System for Pediatric Appendicitis

arXiv:2606.19183v1 Announce Type: new Abstract: Large language models (LLMs) can make clinical decision support more accessible by interpreting free-text documentation, but their direct use as diagnostic engines is limited by sensitivity to prompts, information order, and…

33
llama.cpp releases dev-tools 14d ago

b9688

server: (router) add model management API ( #23976 ) wip server: (router) add SSE realtime updates API nits wip add download API add download api update docs add delete endpoint fix std::terminate fix crash fix 2 add tests nits macOS/iOS: macOS Apple Silicon (arm64) macOS Apple…

17
arXiv — Machine Learning research 15d ago

Informative Missingness to Generate Irregular Clinical Time Series

arXiv:2606.17106v1 Announce Type: new Abstract: Laboratory tests in electronic health records are collected irregularly, and the absence of a test order can be as informative as the measurement itself. Such missingness reflects clinicians' decisions and patient physiology,…

8
arXiv — Machine Learning research 15d ago

SpatioTemporal Causal Network Diagnostics for Geographic Tipping Point Early Warning

arXiv:2606.17553v1 Announce Type: new Abstract: Geographic tipping points in ecosystems, climate subsystems, or ice sheets pose severe challenges for localized early warning. Classical spatial indicators such as Moran's I summarize global spatial structure, but they struggle…

25
arXiv — Machine Learning research 15d ago

Multiple cyclicity and Wavelet Decomposition with Channel Correlation for Long-term Time Series Forecasting

arXiv:2606.17996v1 Announce Type: new Abstract: Cyclicity and trend are important components of time series data and many studies based on cyclicity and trend have achieved good results in long-term time series forecasting. However, we believe that current work neglects the…

37
arXiv — Machine Learning research 15d ago

RadSEM: A Finding-by-Finding Metric for Clinical Consistency in Radiology Reports

arXiv:2606.17062v1 Announce Type: cross Abstract: Radiology report evaluation must distinguish clinical compatibility from surface similarity, because negation, laterality, or normal-abnormal polarity can reverse a finding. We propose RadSEM (Radiology Sentence-Level Evaluation…

11
arXiv — Machine Learning research 15d ago

KFTD: Koopman-Fourier Time-Differentiable Network for Continuous Ocean Spatiotemporal Forecasting

arXiv:2606.17070v1 Announce Type: cross Abstract: Accurate oceanic forecasting is critical for climate monitoring and disaster early warning. However, ocean spatiotemporal forecasting encounters the double challenges of modeling complex dynamical systems and ensuring…

10
arXiv — NLP / Computation & Language research 15d ago

AIPatient Arena: EHR-grounded evaluation of large language models in end-to-end clinical consultation workflows

arXiv:2606.17474v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly considered for use in clinical consultation tasks, yet most medical evaluations remain static, single-turn, or narrowly outcome-based, limiting their ability to reflect the sequential,…

17
arXiv — NLP / Computation & Language research 15d ago

The Slop Paradox: How Synthetic Standardization Erodes Clinical Uncertainty and Cross-Modal Alignment in AI-Rewritten Radiology Reports

arXiv:2606.17791v1 Announce Type: new Abstract: AI-assisted clinical documentation tools increasingly summarize, standardize, and reformat radiology reports using large language models (LLMs). We present a controlled measurement of the resulting information degradation. Using…

24
arXiv — NLP / Computation & Language research 15d ago

When Multiple Scripts Matter: Evaluating ASR in Clinical Settings

arXiv:2606.17826v1 Announce Type: new Abstract: Automatic speech recognition (ASR) in non-English clinical settings is challenged by multiscript variability, where the same term may appear in multiple valid orthographic forms. Conventional string-matching evaluation metrics…

20
arXiv — NLP / Computation & Language research 15d ago

RubricsTree: Scalable and Evolving Open-Ended Evaluation of Personal Health Agents across Health Memory and Medical Skills

arXiv:2606.18203v1 Announce Type: new Abstract: The LLM-empowered personal health agents with user health (sensor) metrics have offered a promising pathway to alleviate global disparities in healthcare access. However, large-scale clinical deployment remains constrained by an…

28
arXiv — NLP / Computation & Language research 15d ago

SpeechDx: A Multi-Task Benchmark for Clinical Speech AI

arXiv:2606.17339v1 Announce Type: cross Abstract: Speech offers a uniquely informative window into health by simultaneously engaging neurological, motor, respiratory, and vocal systems. Current clinical speech AI methods have largely progressed through isolated…

15
arXiv — NLP / Computation & Language research 15d ago

Beyond Native Success: Auditing Deployment-Interface Exposure of CLIP Backdoors

arXiv:2606.17815v1 Announce Type: cross Abstract: Contrastive Language-Image Pre-training models are widely reused across downstream interfaces, including feature extraction, retrieval, reranking, and selection. Existing CLIP backdoor, however, usually validate attacks on a…

11
arXiv — NLP / Computation & Language research 15d ago

Reading between the Lines: Leveraging Large Language Models for Global Dementia and Depression Assessment from Clinical Interviews

arXiv:2606.18019v1 Announce Type: cross Abstract: Dementia and depression are the most prevalent neuropsychiatric disorders in geriatric populations, and their overlapping symptoms pose major challenges for differential diagnosis. In this study, we investigate open-weights Large…

30
arXiv — NLP / Computation & Language research 15d ago

ProvenanceGuard: Source-Aware Factuality Verification for MCP-Based LLM Agents

arXiv:2606.18037v1 Announce Type: cross Abstract: Tool-using LLM agents increasingly use the Model Context Protocol (MCP) to answer from heterogeneous evidence sources, including search, APIs, databases, clinical records, and formulary tools. Standard factuality metrics usually…

27
arXiv — NLP / Computation & Language research 15d ago

MedicalAgentsBench for Complex Medical Reasoning: Comparing Internalized Reasoning Models versus Externalized Agent-based Frameworks

arXiv:2503.07459v3 Announce Type: replace Abstract: Complex medical reasoning requires integrating heterogeneous clinical evidence across multiple inference steps. Large language models (LLMs) now approach this through two routes: internalized reasoning and externalized agent…

19
Simon Willison community 15d ago

<click-to-play> — a still that plays

Tool: <click-to-play> — a still that plays A progressive enchantment Web Component that turns this markup: <click-to-play> <a href="URL to GIF"> <img src="URL to first frame" alt="..."> </a> </click-to-play> Into a still frame with a click to play button which loads the GIF on…

34
Hugging Face Daily Papers research 15d ago

TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs

Abstract A framework called TRIAGE is proposed to improve clinical early warning systems by training large language models to generate dialectical reasoning for continuous risk scoring with better calibration and interpretability. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

29
Vercel — AI dev-tools 15d ago

CLI deployment limits removed

We've removed CLI-specific deployment limits, making it easier to deploy from local machine and external CI/CD pipelines with instant feedback. Teams and AI agents can now deploy at the pace their workflows demand. Learn more about limits in the Documentation . Read more

5

I pretrained and post trained a 500M parameter LLM and 330M parameter Image generator from scratch

Python packages for particle swarms, genetic algorithms. Scikit-opt maybe? [D]

Qwen code companion on vscode marketplace - thoughts

Gemma 4 26b a4b is genuinely the best model I have tried for language learning and scientific queries!

Quoting Sean Lynch

b9730

Configurable Clinical Information Extraction with Agentic RAG: What Works, What Breaks, and Why

cAPM: Continual AI-Assisted Pace-Mapping with Active Learning

Insulin4RL: Real-Time Insulin Management in the Intensive Care Unit for Offline Reinforcement Learning

Federated Bilevel Performative Prediction

When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning

Exploring the potential of AlphaEarth and TESSERA embeddings for Fine-scale Local Climate Zone Mapping: A case study across five cities in Switzerland

Constrained hybrid modelling to predict microbial dynamics and organic matter turnover in soil systems

Before the Labels: How Dataset Construction Shapes Suicidality Detection in Clinical Text

Prompt, Plan, Extract: Zero-Shot Agentic LLMs Workflows for Lung Pathology Extraction from Clinical Narratives

Source-Grounded Data Generation for Text-to-JSON Learning

MedRLM: Recursive Multimodal Health Intelligence for Long-Context Clinical Reasoning, Sensor-Guided Screening, Evidence-Grounded Decision Support, and Community-to-Tertiary Referral Optimization

Beyond the GUI Paradigm: Do Mobile Agents Need the Phone Screen?

AgentFinVQA: A Deployable Multi-Agent Pipeline for Auditable Financial Chart QA

b9713

b9701

ThousandWorlds: A benchmark for climate emulation of potentially habitable exoplanets

SCOPE-FL: A Strategy-proof Chain-based Optimal pareto efficient Federated Learning System

A Cross-Model VLM-Judge Protocol for Single-Image 3D Mesh Quality (and Why Cheap Proxies Fall Short)

Beyond AHI: An Interpretable Causal-Discovery-Guided Framework for Sleep Recovery in Connected Health

PSyGenTAB: A Privacy-Preserving Framework for Synthetic Clinical Tabular Data Generation via Constrained Optimization

Fair Cognitive Impairment Detection Through Unlearning

ChronoSurv: A Clinical Pathway-Guided Graph Framework for Multimodal Survival Analysis

Possible or Definite? A Benchmark for Evaluating Diagnostic Uncertainty Preservation in Clinical Text

Are LLMs Ready to Assist Physicians? PhysAssistBench for Interactive Doctor-Patient-EHR Assistance

Beyond Scalar Scores: Exploring LLM-based Metrics for Clinical Significance Evaluation in Radiology Reports

Language Models as Interfaces, Not Oracles: A Hybrid LLM-ML System for Pediatric Appendicitis

b9688

Informative Missingness to Generate Irregular Clinical Time Series

SpatioTemporal Causal Network Diagnostics for Geographic Tipping Point Early Warning

Multiple cyclicity and Wavelet Decomposition with Channel Correlation for Long-term Time Series Forecasting

RadSEM: A Finding-by-Finding Metric for Clinical Consistency in Radiology Reports

KFTD: Koopman-Fourier Time-Differentiable Network for Continuous Ocean Spatiotemporal Forecasting

AIPatient Arena: EHR-grounded evaluation of large language models in end-to-end clinical consultation workflows

The Slop Paradox: How Synthetic Standardization Erodes Clinical Uncertainty and Cross-Modal Alignment in AI-Rewritten Radiology Reports

When Multiple Scripts Matter: Evaluating ASR in Clinical Settings

RubricsTree: Scalable and Evolving Open-Ended Evaluation of Personal Health Agents across Health Memory and Medical Skills

SpeechDx: A Multi-Task Benchmark for Clinical Speech AI

Beyond Native Success: Auditing Deployment-Interface Exposure of CLIP Backdoors

Reading between the Lines: Leveraging Large Language Models for Global Dementia and Depression Assessment from Clinical Interviews

ProvenanceGuard: Source-Aware Factuality Verification for MCP-Based LLM Agents

MedicalAgentsBench for Complex Medical Reasoning: Comparing Internalized Reasoning Models versus Externalized Agent-based Frameworks

<click-to-play> — a still that plays

TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs

CLI deployment limits removed