llama.cpp releases
469 articles archived · Visit source ↗ · RSS
-
llama.cpp releases dev-tools 1mo ago
b9353
server : fix the log message when using SSL ( #23393 ) When llama-server is started with SSL key and cert, the log says that it listens on http instead of https. This patch fixes this. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS…
7 -
llama.cpp releases dev-tools 1mo ago
b9352
ggml-zendnn : fixed naming of matmul function ( #20964 ) ggml-zendnn: fixed naming of matmul function ggml-zendnn: fixed naming of mul_mat_id function ggml-zendnn: fixed print in mul_mat_id Co-authored-by: plotnikov.v10 plotnikov.v10@wb.ru macOS/iOS: macOS Apple Silicon (arm64)…
4 -
llama.cpp releases dev-tools 1mo ago
b9351
macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Ubuntu x64…
20 -
llama.cpp releases dev-tools 1mo ago
b9334
CUDA: missing PDL sync for FWHT, better fallback ( #23690 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
19 -
llama.cpp releases dev-tools 1mo ago
b9333
metal : add apple device id ( #23566 ) Co-authored-by: lvyichen lvyichen@stepfun.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
14 -
llama.cpp releases dev-tools 1mo ago
b9331
ci : reduce PR jobs by matching backend paths ( #23675 ) ci : disable SYCL f16 builds ci : extract android and hip into separate workflows ci : move webgpu to separate workflow ci : move the rpc to a separate workflow ci : extract s309x and ppcl jobs ci : extract opencl job into…
24 -
llama.cpp releases dev-tools 1mo ago
b9330
model: tag ffn_latent as MUL_MAT to fix buft probe ( #23664 ) ffn_latent_down/up are declared GGML_OP_MUL in LLM_TENSOR_INFOS but nemotron-h feeds them through ggml_mul_mat. The loader buft probe asks the backend about the declared op, so it tested an elementwise MUL on a q8_0…
7 -
llama.cpp releases dev-tools 1mo ago
b9329
CUDA: add fast walsh-hadamard transform ( #23615 ) CUDA: add fast walsh-hadamard transform review: add unrolls + change size_t -> int warp size 64 Co-authored-by: Johannes Gäßler johannesg@5d6.de macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…
11 -
llama.cpp releases dev-tools 1mo ago
b9326
sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO)…
26 -
llama.cpp releases dev-tools 1mo ago
b9320
TP: fix ggml context size calculation ( #22616 ) TP: fix ggml context size calculation, memory leak move split state cache back into the context revert to constant ggml context size for cgraphs increase headroom for statically allocated tensors remove obsolete include macOS/iOS:…
33 -
llama.cpp releases dev-tools 1mo ago
b9319
ggml: gguf_init_from_callback and gguf_init_from_buffer ( #22341 ) ggml: implement gguf_init_from_buffer test: gguf_init_from_buffer fix: memory breakdown for a model loaded with no_alloc from a file is consistent with being loaded from a buffer fix: use GGML_UNUSED…
9 -
llama.cpp releases dev-tools 1mo ago
b9318
server: MTP layer kv-cache should respect draft type ctk ( #23646 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
9 -
llama.cpp releases dev-tools 1mo ago
b9315
llama : document that only one on-device state can be saved per sequence ( #23520 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
13 -
llama.cpp releases dev-tools 1mo ago
b9313
ggml : Parallelize quant LUT init ( #23595 ) Use OpenMP to parallelize iq2xs_init_impl and iq3xs_init_impl. Move the OpenMP detection from ggml-cpu to ggml-base. Update OpenMP dependencies in ggml-config.cmake.in. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…
28 -
llama.cpp releases dev-tools 1mo ago
b9311
vendor : update cpp-httplib to 0.45.1 ( #23639 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…
25 -
llama.cpp releases dev-tools 1mo ago
b9310
server: fix checkpoints creation ( #22929 ) common : add common_chat_split_by_role cont : fix spans to reach end of message server: fix checkpoints creation extract message_spans from chat templates find the prompt token position before the latest user message split prompt…
36 -
llama.cpp releases dev-tools 1mo ago
b9309: perplexity : fix even more integer overflows (#23623)
Co-authored-by: Stanisław Szymczyk sszymczy@gmail.com
30 -
llama.cpp releases dev-tools 1mo ago
b9305
cmake : fix ui build ( #23592 ) cmake/ui : add -fPIC to llama-ui static lib cmake : rename host compiled embed helper macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…
29 -
llama.cpp releases dev-tools 1mo ago
b9301
hexagon: apply repl optimization in flash attn softmax as #22993 ( #23 …
27 -
llama.cpp releases dev-tools 1mo ago
b9297
model : add NVFP4 MTP scale tensors ( #23563 ) Add NVFP4 MTP scale tensors Link Qwen3.5 MTP tensors Aligned nullptr macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…
11 -
llama.cpp releases dev-tools 1mo ago
b9296
ggml : Check the right iface method before using the fallback 2d get ( #23514 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
9 -
llama.cpp releases dev-tools 1mo ago
b9295
vulkan: fix windows find_package of SPIRV-Headers ( #23215 ) vulkan: fix windows find_package of SPIRV-Headers not windows-only macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…
17 -
llama.cpp releases dev-tools 1mo ago
b9294
opencl: generalize Adreno MoE kernels on M ( #23449 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
17 -
llama.cpp releases dev-tools 1mo ago
b9291
SYCL: improve MoE prefill throughput ( #23142 ) change k_copy_src1_to_contiguous so that uses a precomputed contiguous mapping where all rows "owned" by an expert are in one slice with a know starts and ends switch the O(n_as * n_routed_rows) contraption to a counting sort-based…
27 -
llama.cpp releases dev-tools 1mo ago
b9292
perplexity : fix integer overflow ( #23496 ) Co-authored-by: Stanisław Szymczyk sszymczy@gmail.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…
13 -
llama.cpp releases dev-tools 1mo ago
b9290
sycl : Level Zero detection in ggml_sycl_init ( #23097 ) [SYCL] Centralize Level Zero detection in ggml_sycl_init use the same wording get back the warning macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework…
15 -
llama.cpp releases dev-tools 1mo ago
b9289
SYCL : gated_delta_net K>1 ( #23174 ) sycl_gated_delta_net K>1 editor_config macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
34 -
llama.cpp releases dev-tools 1mo ago
b9286
ggml-zendnn : add Q8_0 quantization support ( #23414 ) ggml-zendnn : add Q8_0 quantization support ggml-zendnn : sync with latest ZenDNN ggml-zendnn : address review comments for Q8_0 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS…
15 -
llama.cpp releases dev-tools 1mo ago
b9285
cmake : build router app only during standalone builds ( #23521 ) Co-authored-by: Stanisław Szymczyk sszymczy@gmail.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64…
17 -
llama.cpp releases dev-tools 1mo ago
b9284
vocab : fix HybridDNA tokenizer ( #23466 ) vocab : mark hybriddna k-mers to avoid BPE token collisions improved loop Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel…
9 -
llama.cpp releases dev-tools 1mo ago
b9283
cmake : add install() for impl libraries + fix apple builds ( #23511 ) pi : update ci : fix ios build ci : fix andoroid ci : fix apple builds cmake : add install() for impl libraries Add install(TARGETS LIBRARY) for all -impl libraries that were changed from STATIC to shared…
14 -
llama.cpp releases dev-tools 1mo ago
b9279
vulkan: fuse snake activation (mul, sin, sqr, mul, add) ( #22855 ) vulkan: fuse snake activation (mul, sin, sqr, mul, add) Add snake.comp shader with F32 / F16 / BF16 pipelines and ggml_vk_snake_dispatch_fused. The matcher recognizes the naive 5 op decomposition emitted by audio…
23 -
llama.cpp releases dev-tools 1mo ago
b9277
tests : move save-load-state from examples to tests ( #23336 ) tests : move save-load-state from examples to tests Move examples/save-load-state/ to tests/test-save-load-state.cpp Remove subdirectory reference from examples/CMakeLists.txt Add test to tests/CMakeLists.txt as a…
25 -
llama.cpp releases dev-tools 1mo ago
b9276
server: expose prompt token counts in /slots endpoint ( #23454 ) Add n_prompt_tokens, n_prompt_tokens_processed, and n_prompt_tokens_cache to the /slots JSON response. These fields are already tracked internally but were not exposed, making it impossible for clients to monitor…
15 -
llama.cpp releases dev-tools 1mo ago
b9275
metal : optimize concat kernel and fix set kernel threads ( #23411 ) metal : fix GGML_OP_SET kernel threads tests : extend test_cpy to support different src/dst shapes Extend test_cpy to support different source and destination tensor shapes for CPY operations (reshaping), where…
37 -
llama.cpp releases dev-tools 1mo ago
b9274
server : free draft/MTP resources on sleep to fix VRAM leak ( #23461 ) The destroy() function in server_context_impl only cleaned up the main model and context (via llama_init.reset()) but did not free the speculative decoder (spec), draft context (ctx_dft), or draft model…
22 -
llama.cpp releases dev-tools 1mo ago
b9273
server: re-inject subcommand when router spawns children under unified binary ( #23442 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…
32 -
llama.cpp releases dev-tools 1mo ago
b9272
app : add batched-bench, fit-params, quantize & perplexity ( #23459 ) app : add batched-bench, fit-params, quantize & perplexity Signed-off-by: Adrien Gallouët angt@huggingface.co Add missing main.cpp Signed-off-by: Adrien Gallouët angt@huggingface.co Add EOL Signed-off-by:…
37 -
llama.cpp releases dev-tools 1mo ago
b9271
mtp: use inp_out_ids for skipping logit computation ( #23433 ) when doing a follow-up decode for the draft model, we were always doing the logit computation even though it is not required. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS…
23 -
llama.cpp releases dev-tools 1mo ago
b9270
vocab : add Carbon-3B (HybridDNATokenizer) support ( #23410 ) vocab : add Carbon-3B (HybridDNATokenizer) support Adds a new BPE pre-type LLAMA_VOCAB_PRE_TYPE_CARBON for the HybridDNATokenizer used by HuggingFaceBio/Carbon-{500M,3B,8B}. The base BPE is Qwen3-4B-Base's; what…
11 -
llama.cpp releases dev-tools 1mo ago
b9267
ggml : Check the right iface method before using the fallback 2d get ( #23306 ) Probably no backends implement only one of 2d get/set, but this might be annoying for some future backend developer trying to add 2d get/set. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple…
11 -
llama.cpp releases dev-tools 1mo ago
b9266
llama-graph: fix null-buffer crash in llm_graph_input_attn_kv_iswa for SWA-only models ( #23131 ) When a model has zero non-SWA attention layers (e.g. a SWA-only slice of Gemma 4), the base KV cache has no layer tensors. The input tensors (self_k_idxs, self_v_idxs, self_kq_mask)…
13 -
llama.cpp releases dev-tools 1mo ago
b9264
app : show version ( #23426 ) Signed-off-by: Adrien Gallouët angt@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
26 -
llama.cpp releases dev-tools 1mo ago
b9265: hexagon: ssm-conv fix for large prompts (#23307)
hexagon: remove gathers and better handling of vtcm in ssm-conv hexagon: relax ssm-conv gating requirements hexagon: add new prefill ssm-conv backend test hexagon: remove trailing white space hex-rope: uninline rope_cache_init, otherwise it breaks after rebaseing with SSM_CONV…
34 -
llama.cpp releases dev-tools 1mo ago
b9263
mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision precision ( #23329 ) HunyuanOCR shares the same HF arch and vision layout as HunyuanVL butwas split into a separate path that skipped the +0.1 bilinear sampler used by the HF reference. Collapse OCR into the…
12 -
llama.cpp releases dev-tools 1mo ago
b9260
opencl: refactor backend initilization ( #23318 ) opencl: refactor initialization opencl: refactor GPU identification opencl: rename for consistency opencl: cache global mem size in dev_ctx opencl: adjust log level opencl: load argsort and flash_attn kernels in supports_op…
7 -
llama.cpp releases dev-tools 1mo ago
b9259
common/speculative : fix nullptr crash in get_devices_str ( #23386 ) ggml_backend_dev_by_name always appends a nullptr sentinel to the devices vector. Skipping nullptr entries prevents assertion failure in ggml_backend_dev_name. Assisted-by: llama.cpp:local pi macOS/iOS: macOS…
20 -
llama.cpp releases dev-tools 1mo ago
b9258
mtmd : DeepSeek-OCR image processing fixes, img_tool::resize padding refactor ( #23345 ) mtmd : deepseek-ocr fixes, improvements and refactoring image processing changes to achieve full parity with Pillow (reference impl) SAM mask casting only when flash-attn is on SAM refactor…
24