llama.cpp releases
469 articles archived · Visit source ↗ · RSS
-
llama.cpp releases dev-tools 28d ago
b9509
server: avoid unnecessary checkpoint restore when new tokens are present ( #24110 ) server: avoid unnecessary checkpoint restore when new tokens are present The pos_min_thold calculation unconditionally subtracts 1 to ensure at least one token is evaluated for logits when no new…
21 -
llama.cpp releases dev-tools 28d ago
b9505
server : add header to tools/server/server-http.h ( #24089 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
29 -
llama.cpp releases dev-tools 28d ago
b9504
cmake: skip cvector-generator and export-lora when CPU backend is disabled ( #24053 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…
4 -
llama.cpp releases dev-tools 28d ago
b9503
fix(mtmd): handle Gemma 4 audio projector embedding size ( #24091 ) mtmd: handle Gemma 4 audio projector embedding size rm projection_dim from clip_n_mmproj_embd Co-authored-by: Xuan Son Nguyen son@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…
28 -
llama.cpp releases dev-tools 28d ago
b9500
metal : reduce rset heartbeat from 500ms -> 5ms ( #24074 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
37 -
llama.cpp releases dev-tools 28d ago
b9499
ggml-webgpu: FlashAttention refactor + standardize quantization support ( #23834 ) Start work on flash_attn refactor Refactor Split k/v quantization Refactor and abstract quantization logic for flash_attn and mul_mat Add quantization support to tile path formatting Move to…
23 -
llama.cpp releases dev-tools 28d ago
b9498
ggml-cpu: extend RVV quantization vec dot to higher VLENs ( #22754 ) ggml-cpu: add rvv 512b,1024b impls for iq4_xs ggml-cpu: refactor; add rvv 512b, 1024b impls for q6_K, i-quants ggml-cpu: refactor; add 512 and 1024 implementations of tq3_s, iq3_xxs, iq2_s, iq2_xs, iq2_xxs…
22 -
llama.cpp releases dev-tools 28d ago
b9501: tests : refactor test-save-load-state to accept token input (#24073)
tests : refactor test-save-load-state to accept token input Default prompt is now empty; when not provided, generate n_batch random tokens (useful for models without a tokenizer) Tokenization happens once upfront; pass token vector to test functions generate_tokens prints token…
26 -
llama.cpp releases dev-tools 28d ago
b9496
mtmd: fix Gemma 4 unified FPE ( #24088 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…
27 -
llama.cpp releases dev-tools 29d ago
b9495
qwen35: use post-norm hidden state for MTP ( #24025 ) qwen35: use post-norm hidden state for MTP rename pre_norm to nextn fix step35 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64…
37 -
llama.cpp releases dev-tools 29d ago
b9494
mtmd: enable non-causal vision for gemma 4 unified ( #24082 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
22 -
llama.cpp releases dev-tools 29d ago
b9493
mtmd, model: allow skip build_vit() ( #24077 ) add model nits macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
26 -
llama.cpp releases dev-tools 29d ago
b9491
Avoid PDL race conditions by disabling restrict when PDL is used ( #24030 ) Removes restrict from PDL kernel headers due to incompatibility with PDL. Adds preprocessor directives based on arch in kernel body to add restrict to retain performance on older architectures.…
9 -
llama.cpp releases dev-tools 29d ago
b9490
ggml-cpu: use runtime SVE width in FWHT ( #24059 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
32 -
llama.cpp releases dev-tools 29d ago
b9489
cuda: reserve space for quantize kv-cache at startup ( #23907 ) cuda: reserve space for quantize kv-cache at startup address review comments remove forward decl Co-authored-by: Johannes Gäßler johannesg@5d6.de remove assert in ggml-cuda.cu Co-authored-by: Johannes Gäßler…
25 -
llama.cpp releases dev-tools 29d ago
b9488
tests : add support for qwen3 SSM archs ( #24031 ) tests : add support for qwen3 SSM archs arch : add LLM_KV_ATTENTION_RECURRENT_LAYERS cont : naming + TODOs macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS…
24 -
llama.cpp releases dev-tools 29d ago
b9486
ci : disable ccache for msvc windows release jobs ( #23911 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
6 -
llama.cpp releases dev-tools 29d ago
b9485
arg : removed unecesary mmproj download when users pass --no-mmproj ( #23425 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
38 -
llama.cpp releases dev-tools 29d ago
b9484
opencl: use flat variants of q4_K and q6_K gemv for very large M ( #24006 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
6 -
llama.cpp releases dev-tools 29d ago
b9483
hexagon: profiler output fix and script updates ( #24042 ) hex-ops: fix profiler output (ie remove the redundant NONEs) hex-prof: update profiling script to support tot.usec column macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED…
26 -
llama.cpp releases dev-tools 29d ago
b9482
model: add Mellum architecture ( #23966 ) model: support for Mellum architecture model: improve mellum.py formatting model: improve mellum.py formatting once again deps: downgrade transformers to 4.57.6 (to fix CI) deps: remove huggingface_hub dependency deps: remove…
13 -
llama.cpp releases dev-tools 1mo ago
b9481
model : support granite multilingual embeddings R2 (ibm-granite/granite-embedding-{97,311}m-multilingual-r2) ( #22716 ) Add support for the ibm-granite/granite-embedding-{97m,311m}-multilingual-r2 embedding models: Added a version of the gpt4o tokenizer that has a fixed regex…
22 -
llama.cpp releases dev-tools 1mo ago
b9480
StepFun 3.5 MTP ( #23274 ) StepFun 3.5 MTP Simplify to single layer Rollback core changes fix flake8 errors Remove scripts modify to convention Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com dos2unix Co-authored-by: Sigbjørn…
14 -
llama.cpp releases dev-tools 1mo ago
b9479
common : fix state save in common_prompt_batch_decode ( #23468 ) common : fix state save in common_prompt_batch_decode This commit addresses a bug in common_prompt_batch_decode that affects the session state store/restore in completion.cpp and save-load-state.cpp. The motivation…
10 -
llama.cpp releases dev-tools 1mo ago
b9478
server: add SSE ping interval ( #24013 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…
17 -
llama.cpp releases dev-tools 1mo ago
b9474
ui: Add Thinking mode toggle with reasoning effort levels + improvements for Chat Form Add Action UI ( #23434 ) feat: Add "Thinking" toggle and status icon + redesign Chat Form Actions Add panel test: Update test reference fix: Icon fix: E2E test command fix: wait for greeting…
5 -
llama.cpp releases dev-tools 1mo ago
b9473
kv-cache : SWA checkpoints store only non-masked cells ( #23981 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
37 -
llama.cpp releases dev-tools 1mo ago
b9471
llama : deprecate llama_set_warmup ( #24009 ) llama : deprecate llama_set_warmup cont : fix type Co-authored-by: Daniel Bevenius daniel.bevenius@gmail.com Co-authored-by: Daniel Bevenius daniel.bevenius@gmail.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…
19 -
llama.cpp releases dev-tools 1mo ago
b9470
hexagon: MUL_MAT, MUL_MAT_ID, FLASH_ATTN and GDN cleanup and optimizations for latest models ( #23989 ) hex-mm: initial support for F32 * F32 -> F32 matmuls hex-rms-norm: fix src1 stride use in fused rms_norm_mul hex-ops: clear spad pointers in the ops that clober it This fixes…
10 -
llama.cpp releases dev-tools 1mo ago
b9469
hexagon: add gelu_quick ( #24007 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…
37 -
llama.cpp releases dev-tools 1mo ago
b9468
server: real-time reasoning interruption via control endpoint ( #23971 ) server: real-time reasoning interruption via control endpoint Builds on the manual reasoning budget trigger from #23949 . Adds a CONTROL task that mirrors the CANCEL path on the live slot and calls…
17 -
llama.cpp releases dev-tools 1mo ago
b9467
clean up unused variables warnings ( #23975 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
14 -
llama.cpp releases dev-tools 1mo ago
b9466
opencl: fix compiler warnings for non-adreno path ( #23922 ) opencl: fix compiler warnings for non-adreno path opencl: fix const cast warning macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux:…
31 -
llama.cpp releases dev-tools 1mo ago
b9464
speculative : fix n_outputs_max and remove draft-simple auto-enable ( #23988 ) speculative : add common_speculative_n_max helper function Extract the speculative max-draft-size logic from server_n_outputs_max into a reusable common_speculative_n_max() function in…
7 -
llama.cpp releases dev-tools 1mo ago
b9460
llama: limit max outputs of llama_context ( #23861 ) llama: save more VRAM by reserving n_outputs == n_seqs when possible add n_outputs_per_seq move n_outputs_max to server-context change ubatch to batch everywhere macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…
15 -
llama.cpp releases dev-tools 1mo ago
b9459
metal: template GLU kernels to support f16/f32 ( #23882 ) Drops the hardcoded f32 GLU kernels in favor of a single template. We now load/store in the native tensor type (half or float) to save memory bandwidth, but keep the actual ALU compute in float to avoid exploding math in…
35 -
llama.cpp releases dev-tools 1mo ago
b9458
vulkan: don't hold the device mutex while compiling pipelines ( #23641 ) vulkan: don't hold the device mutex while compiling pipelines We need to hold a lock while we traverse all pipelines and lazily initialize them, but we don't need to hold it while the pipeline is being…
37 -
llama.cpp releases dev-tools 1mo ago
b9457
vulkan: reduce host memory lock contention ( #23376 ) vulkan: reduces lock contention replace unique_lock with lock_guard macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…
26 -
llama.cpp releases dev-tools 1mo ago
b9455
TP: quantized KV cache support ( #23792 ) TP: quantized KV cache support fix partial view remove overly strict assert macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…
15 -
llama.cpp releases dev-tools 1mo ago
b9453
model: Add EXAONE 4.5 implementations ( #21733 ) Add EXAONE 4.5 and Add GQA for MMproj mtmd: EXAONE 4.5 vision markers and projector path EXAONE 4.5 uses and for image boundaries; Qwen keeps <|vision_start|> and <|vision_end|>. Route EXAONE 4.5 through the Qwen2.5-VL-style…
32 -
llama.cpp releases dev-tools 1mo ago
b9452
vulkan: Block-load Q3_K/Q6_K block data and subtract on 32b ints ( #23056 ) Q2_K/Q3_K/Q6_K do much better when using MMVQ on Intel BMG even though they're only 2-byte aligned, and Q3_K still wins on NVIDIA as well. mesa isn't all that great at coalescing back-to-back loads from…
4 -
llama.cpp releases dev-tools 1mo ago
b9451
vulkan: Removed unused functions ( #23175 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
37 -
llama.cpp releases dev-tools 1mo ago
b9445: ci: remove redundant or duplicate jobs (#23927)
remove redundant apple job openvino gpu and cpu test can share the same build and machine Update build-rpc.yml Update build-openvino.yml cpu any doesnt make sense as we have an arm job already, so do high perf on both x86 and arm remove duplicate x86 vulkan combine backend…
31 -
llama.cpp releases dev-tools 1mo ago
b9444
server : handle If-None-Match weak ETags ( #23916 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
12 -
llama.cpp releases dev-tools 1mo ago
b9442
vocab : add tokenizer support for jina-embeddings-v2-base-zh ( #18756 ) vocab : add jina-embeddings-v2-base-zh (whitespace tokenizer) lowercase defaults to true type fix Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com macOS/iOS: macOS Apple Silicon (arm64) macOS…
12 -
llama.cpp releases dev-tools 1mo ago
b9441
ui: fix ETag truncation with MSVC compiler ( #23917 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
20 -
llama.cpp releases dev-tools 1mo ago
b9439
llama: only use one iGPU device by default ( #23897 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
4 -
llama.cpp releases dev-tools 1mo ago
b9438: webui: add custom CSS injection via config (#23904)
webui: add custom CSS injection via config register a customCSS setting in the Developer section under Custom JSON, syncable so it rides the existing ui-config pass through. inject the value into a single style element in the head, reactive on the setting. lets an operator theme…
28 -
llama.cpp releases dev-tools 1mo ago
b9437
Support -fa auto in llama-bench ( #23714 ) Support -fa auto in llama-bench Make the default value of -ngl -1, similar to other tools. Update README with latest usage and examples Address review comments macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…
16