llama.cpp releases
469 articles archived · Visit source ↗ · RSS
-
llama.cpp releases dev-tools 23d ago
b9577
server: log prompts to directory ( #22031 ) server: log prompts to directory Add --log-prompts-dir to write each prompt to a separate text file in the specified directory. Apply suggestion from @ngxson Co-authored-by: Xuan-Son Nguyen thichthat@gmail.com macOS/iOS: macOS Apple…
35 -
llama.cpp releases dev-tools 23d ago
b9575
ggml : add GGML_OP_COL2IM_1D ( #24206 ) cpu: add GGML_OP_COL2IM_1D Add the overlap-add (scatter-add) step of a 1D transposed convolution. A ConvTranspose1d factorizes as a GEMM followed by col2im: a weight pre-permuted to [IC, K OC] is contracted against the [IC, T_in] input…
4 -
llama.cpp releases dev-tools 23d ago
b9574
server : do not clear slots without unified KV cache ( #24190 ) Always export idle slots to RAM Without this, a slot's VRAM cache may not be written to RAM. If this slot happens to be busy then later on, this triggers needless preprocessing in another slot. cont : clean-up…
33 -
llama.cpp releases dev-tools 23d ago
b9573
models : fix plamo2 attention_key/value_length regression ( #24317 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
15 -
llama.cpp releases dev-tools 23d ago
b9572
ggml-cpu : fix rms_norm_back wrong output under in-place aliasing ( #24305 ) ggml-cpu : fix rms_norm_back wrong output under in-place aliasing cont : clean-up comment Co-authored-by: Georgi Gerganov ggerganov@gmail.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…
27 -
llama.cpp releases dev-tools 23d ago
b9571
Remove case for GGML_TYPE_Q4_K in mvvq.cu ( #23528 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
7 -
llama.cpp releases dev-tools 23d ago
b9570
ggml-webgpu: Add clang-format job ( #24308 ) Add clang-format job try local formatting macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…
34 -
llama.cpp releases dev-tools 23d ago
b9568
mtp: support for gemma-4 E2B and E4B assistants ( #24282 ) models: update converter to support smaller assistants models: add masked_embd tensors to gemma4-assist arch gemma-4: remove temp debug for conversion gemma-4-mtp: filter out masked_embedding tensors during conversion…
23 -
llama.cpp releases dev-tools 23d ago
b9567
server : do not parse when flushing http headers ( #24281 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
26 -
llama.cpp releases dev-tools 23d ago
b9566
graph: guard iswa kq_mask on its own buffer ( #24294 ) A SWA-only draft head (e.g. StepFun MTP) leaves the base sub-cache empty, so its kq_mask buffer stays null and asserts at load. Guard each mask on its own buffer in set_input and can_reuse, base and swa. Co-authored-by:…
23 -
llama.cpp releases dev-tools 23d ago
b9565
[ggml-webgpu] Handle buffer overlap / buffer aliasing for concat operator ( #24000 ) Only run webgpu CI on my fork Add webgpu only workflow handle buffer overlap case for concat operator restore build-webgpu.yml Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com Run…
14 -
llama.cpp releases dev-tools 23d ago
b9564
[ggml-webgpu] Implement 2D workgroups for scale, binary, and unary ops ( #24044 ) Only run webgpu CI on my fork Add webgpu only workflow Implement 2d workgroups for more operations fix Fix type Move back to global_invocation_id macOS/iOS: macOS Apple Silicon (arm64) macOS Apple…
24 -
llama.cpp releases dev-tools 24d ago
b9562
mtmd : add video input support ( #24269 ) wip ok: lazy bitmap API remember to free lazy text wip add mtmd_helper_video support video input on server (base64 input) add MTMD_VIDEO config add timestamp update CLI cli: allow auto-completion for video add --video arg fix build…
22 -
llama.cpp releases dev-tools 24d ago
b9561
sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64…
13 -
llama.cpp releases dev-tools 24d ago
b9559
cli: fix spinner not show during prompt processing ( #24283 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
10 -
llama.cpp releases dev-tools 24d ago
b9563
docker: install ffmpeg in the released image ( #24302 )
24 -
llama.cpp releases dev-tools 24d ago
b9558
vulkan: Use cm2 decode_vector for mul_mat_id B matrix loads ( #23991 ) This allows vec4 loads of the B elements. Also increase BK to 64 when this is enabled. Neither of these alone is consistently faster, but together these give a nice speedup. In ggml-vulkan.cpp, we need to…
28 -
llama.cpp releases dev-tools 24d ago
b9557
cuda: reset cuda context after reading memory size ( #23935 ) cuda: reset device in get_memory function if no backend is active also count device and host buffers exclude hip and musa from counting and device reset use device mutex instead of atomic undo backend_free function…
34 -
llama.cpp releases dev-tools 24d ago
b9556
HIP: add gfx1152 and gfx1153 to RDNA3.5 ( #24129 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
10 -
llama.cpp releases dev-tools 24d ago
b9555
metal : fix im2col 1D case (audio models) ( #24220 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
29 -
llama.cpp releases dev-tools 24d ago
b9553
common : relax sampler name matching ( #23744 ) common : relax sampler name matching Currently, in some cases, the alternative names for samplers (like top-k and min-p instead of the canonical top_k and min_p ) are not always recognized by the common_sampler_types_from_names…
32 -
llama.cpp releases dev-tools 24d ago
b9551
kv-cache : avoid kv cells copies ( #24277 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
7 -
llama.cpp releases dev-tools 25d ago
b9550
kv-cache: follow the source cache size when sharing cells ( #24267 ) A fitted target context can end up smaller than the draft default, the oversized assistant views then overflow the shared K/V tensors and trip the ggml_view_4d size assert during graph reserve. macOS/iOS: macOS…
25 -
llama.cpp releases dev-tools 25d ago
b9549
llama : add Gemma4 MTP ( #23398 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…
18 -
llama.cpp releases dev-tools 25d ago
b9548
spec : fix vocab compatibility check ( #24256 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
14 -
llama.cpp releases dev-tools 25d ago
b9547
arg: Skip mmproj download when user supplied mmproj ( #24239 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
32 -
llama.cpp releases dev-tools 25d ago
b9544
common/chat : fix LFM2/LFM2.5 reasoning round-trip and leak ( #24234 ) common/chat : fix LFM2 reasoning round-trip and stray leak Gate by reasoning format and whether the template supports macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…
30 -
llama.cpp releases dev-tools 25d ago
b9543
mtmd: support "frame merge" for qwen-vl-based models ( #21858 ) feat: add video support for Qwen3.5 various clean up revise the design fix llava-uhd case nits nits 2 Co-authored-by: andrewmd5 1297077+andrewmd5@users.noreply.github.com macOS/iOS: macOS Apple Silicon (arm64) macOS…
37 -
llama.cpp releases dev-tools 26d ago
b9542
completion : remove useless statics ( #24226 ) Signed-off-by: Adrien Gallouët angt@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu…
6 -
llama.cpp releases dev-tools 26d ago
b9541
completion : fix format specifier in LOG_INF ( #24213 ) Signed-off-by: Adrien Gallouët angt@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…
7 -
llama.cpp releases dev-tools 26d ago
b9538
model : rename local n_layer_all variable ( #24209 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
15 -
llama.cpp releases dev-tools 26d ago
b9537
context : fix off-by-one comparisons to n_gpu_layers ( #24208 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
37 -
llama.cpp releases dev-tools 26d ago
b9536
opencl: improve get_rows, cpy, concat and q6_k flat gemv ( #24160 ) opencl: allow multiple workgroups for large rows opencl: improve small cpy opencl: packed concat for small input opencl: tweak flat q6_K gemv, increase N_DST and remap threads macOS/iOS: macOS Apple Silicon…
27 -
llama.cpp releases dev-tools 26d ago
b9535
common/chat : unify and fix LFM2/LFM2.5 tool parser ( #24178 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
19 -
llama.cpp releases dev-tools 26d ago
b9534
vulkan: add fwht support for Intel with shmem reduction ( #23964 ) vulkan: add fwht support for Intel with shmem reduction don't use N as workgroup size disable subgroup shuffle on MoltenVK AMD disable fwht shader on Intel Windows due to driver bug macOS/iOS: macOS Apple Silicon…
21 -
llama.cpp releases dev-tools 26d ago
b9533
model: fix build failed ( #24193 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…
28 -
llama.cpp releases dev-tools 27d ago
b9531
TP: round up granularity to 128 ( #24180 ) TP: round up granularity to 128 remove assert macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…
25 -
llama.cpp releases dev-tools 27d ago
b9530
cli: fix model params not propagated ( #23893 ) Fixes #23847 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
21 -
llama.cpp releases dev-tools 27d ago
b9529
model : fix llama_model::n_gpu_layers() ( #24188 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
36 -
llama.cpp releases dev-tools 27d ago
b9528
ui: run npm install when package-lock.json is newer than node_modules ( #24171 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…
17 -
llama.cpp releases dev-tools 27d ago
b9524
minor : fix lint issues ( #24165 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…
17 -
llama.cpp releases dev-tools 27d ago
b9523
hparams : refactor hparams.n_layer ( #24060 ) hparams : refactor hparams.n_layer cont : remove n_layer_kv() , use n_layer_all instead cont : type consistency pi : update SYSTEM.md models : fix Step3.5 MTP cont : remove duplicate switch cases cont : explicitly set false to extra…
30 -
llama.cpp releases dev-tools 27d ago
b9522
kleidiai : dynamic chunck-based scheduling for hybrid execution ( #23819 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
16 -
llama.cpp releases dev-tools 27d ago
b9521
CUDA: enroll mul_mat_vec_q_moe into pdl ( #24087 ) Enroll mul_mat_vec_q_moe into PDL, boosting MTP performance on BW Data collected on a B4500: Before (llama.cpp) ➜ llama.cpp git:(master) ✗ python mtp-bench.py code_python pred= 192 draft= 150 acc= 116 rate=0.773 tok/s=202.8…
10 -
llama.cpp releases dev-tools 27d ago
b9519
sycl : port multi-column MMVQ from CUDA backend ( #21845 ) mmvq: Port the ncols_dst optimization from ggml-cuda/mmvq.cu to SYCL. Read weights once per dispatch instead of once per column. Covers all standard quant types + reorder paths for Q4_0, Q8_0, Q3_K, Q4_K, Q5_K, Q6_K. IQ…
4 -
llama.cpp releases dev-tools 27d ago
b9518
server : disable on-device spec checkpoints ( #24108 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
15 -
llama.cpp releases dev-tools 28d ago
b9515
Move duplicated imatrix code into single common imatrix-loader.cpp ( #22445 ) Deduplicate imatrix loading code Add back LLAMA_TRACE, early exit on quantize missing metadata macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel…
26 -
llama.cpp releases dev-tools 28d ago
b9512
return filter to save memory ( #24125 ) Co-authored-by: lvyichen lvyichen@stepfun.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…
25 -
llama.cpp releases dev-tools 28d ago
b9510
ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 ( #22209 ) ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 Optimize the inner loop of ggml_vec_dot_q4_1_q8_1_generic using WASM SIMD128 intrinsics, gated behind #ifdef wasm_simd128 so non-wasm builds are…
11