llama.cpp releases
468 articles archived · Visit source ↗ · RSS
-
llama.cpp releases dev-tools 13d ago
b9722
server: fix non-bound n_discard value (ctx shifting) ( #24786 ) server: fix non-bound n_discard value Update tools/server/server-context.cpp Co-authored-by: Georgi Gerganov ggerganov@gmail.com Co-authored-by: Georgi Gerganov ggerganov@gmail.com macOS/iOS: macOS Apple Silicon…
36 -
llama.cpp releases dev-tools 13d ago
b9721
sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64…
31 -
llama.cpp releases dev-tools 13d ago
b9718
server : consolidate slot selection into get_available_slot ( #24755 ) Absorb get_slot_by_id logic into get_available_slot so slot selection is handled by a single function call. When a specific slot id is requested, the LCP similarity check still runs to enable proper prompt…
25 -
llama.cpp releases dev-tools 13d ago
b9717
ggml-cpu: support K tails in power10 Q8/Q4 MMA matmul ( #24753 ) ggml-cpu: support K tails in Power10 MMA Q8/Q4 matmul This patch removes the requirement that K be divisible by kc in the tinyBlas_Q0_PPC tiled matmul path. Process the final K panel using its actual depth and pass…
38 -
llama.cpp releases dev-tools 13d ago
b9716
mtmd: add batching support for internvl ( #24775 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
16 -
llama.cpp releases dev-tools 13d ago
b9715
Ggml/cuda col2im 1d ( #24417 ) cuda: add GGML_OP_COL2IM_1D, follow-up to the CPU op cuda: col2im_1d use fast_div_modulo for the index decomposition cuda: col2im_1d tighten supports_op, type match and contiguous dst macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…
30 -
llama.cpp releases dev-tools 13d ago
b9714
server: add "X-Accel-Buffering": "no" header to streaming endpoints ( #24774 ) server: add "X-Accel-Buffering": "no" header to streaming endpoints This header tells Nginx (as a reverse proxy) to NOT buffer responses. (only affects streaming endpoints) Without it, Nginx will…
11 -
llama.cpp releases dev-tools 13d ago
b9713
mtmd: add batching for mtmd-cli, add video tests ( #24778 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
22 -
llama.cpp releases dev-tools 13d ago
b9712
cmake : fix ui build with read-only source ( #24752 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
4 -
llama.cpp releases dev-tools 13d ago
b9711
mtmd: refactor llava-uhd overview image handling (always use ov_img_first) ( #24769 ) add dedicated "overview" for mtmd_image_preproc_out corrections correct (again) nits nits (2) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS…
14 -
llama.cpp releases dev-tools 13d ago
b9707
server: add "schema" and validation ( #24150 ) wip working correct some limits add field name to error message macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64…
5 -
llama.cpp releases dev-tools 13d ago
b9704
server : return HTTP 400 on invalid grammar ( #24144 ) ( #24154 ) Throw on grammar parse failure so the server returns HTTP 400 instead of silently dropping the constraint. Add a regression test for the invalid-grammar response. Fixes #24144 macOS/iOS: macOS Apple Silicon…
26 -
llama.cpp releases dev-tools 13d ago
b9703
server: (router) rework -hf preset repo ( #24739 ) server: temporary remove HF remote preset rework remove preset.ini support rm unused get_remote_preset_whitelist() print warning add docs rm stray file macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…
19 -
llama.cpp releases dev-tools 13d ago
b9702
server: fix router args not being forwarded to child instances ( #24760 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
14 -
llama.cpp releases dev-tools 13d ago
b9701
mtmd: refactor preprocessor, add mtmd_image_preproc_out ( #24736 ) add mtmd_image_preproc_out add dev docs remove unused clip API rm unused clip_image_f32_batch::grid change preprocess() call signature macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…
15 -
llama.cpp releases dev-tools 14d ago
b9700
[SYCL] rename GGML_SYCL_SUPPORT_LEVEL_ZERO ( #24719 ) rename GGML_SYCL_SUPPORT_LEVEL_ZERO to GGML_SYCL_SUPPORT_LEVEL_ZERO_API, and GGML_SYCL_ENABLE_LEVEL_ZERO to GGML_SYCL_USE_LEVEL_ZERO_API fix code format fix error when rebase macOS/iOS: macOS Apple Silicon (arm64) macOS Apple…
31 -
llama.cpp releases dev-tools 14d ago
b9699
sycl : support MUL_MAT and OUT_PROD with Q1_0 ( #24721 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
31 -
llama.cpp releases dev-tools 14d ago
b9698
app : enable self-update only when built with llama-install.sh ( #24754 ) Signed-off-by: Adrien Gallouët angt@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…
34 -
llama.cpp releases dev-tools 14d ago
b9697
ci : fix check-release message parsing ( #24751 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
25 -
llama.cpp releases dev-tools 14d ago
b9694
ci : fix Windows x64 (OpenVINO) release link ( #24731 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
28 -
llama.cpp releases dev-tools 14d ago
b9693
metal : check for BF16 support in concat kernel ( #24747 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
16 -
llama.cpp releases dev-tools 14d ago
b9692
mtmd: llava_uhd should no longer use batch dim ( #24732 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
32 -
llama.cpp releases dev-tools 14d ago
b9691
ggml-cpu: Conditionally enable power11 backend based on compiler support ( #24687 ) ggml: Conditionally enable power11 backend based on compiler support Guard POWER11 backend creation behind a compiler flag check for -mcpu=power11. This avoids build failures on current GCC/Clang…
14 -
llama.cpp releases dev-tools 14d ago
b9690
metal : implement rope_back operator ( #24725 ) Reuse existing rope kernels with a function constant to toggle forward/backward rotation, avoiding duplicate kernel code. Assisted-by: pi:llama.cpp/Qwen3.6-27B macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…
27 -
llama.cpp releases dev-tools 14d ago
b9689
metal : add f16 and bf16 support for concat operator ( #24724 ) metal : add f16 and bf16 support for concat operator Extend the Metal backend concat operator to support f16 and bf16 tensor types in addition to the existing f32 and i32 support. Template kernel_concat on type T…
34 -
llama.cpp releases dev-tools 14d ago
b9688
server: (router) add model management API ( #23976 ) wip server: (router) add SSE realtime updates API nits wip add download API add download api update docs add delete endpoint fix std::terminate fix crash fix 2 add tests nits macOS/iOS: macOS Apple Silicon (arm64) macOS Apple…
17 -
llama.cpp releases dev-tools 14d ago
b9687
llama : skip main_gpu validation when no devices are available ( #23405 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
11 -
llama.cpp releases dev-tools 14d ago
b9686
spec: fix segfault error on long prompts for eagle3 ( #24707 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
17 -
llama.cpp releases dev-tools 14d ago
b9685
[SYCL] add dev2dev memcpy by SYCL API ( #24476 ) add dev2dev memcpy by SYCL API mv GGML_SYCL_DEV2DEV_MEMCPY to runntime table update the detect method for p2p comm fix the erro created during fix confilct Co-authored-by: Neo Zhang macOS/iOS: macOS Apple Silicon (arm64) macOS…
33 -
llama.cpp releases dev-tools 14d ago
b9684
[SYCL] Add conv_3d ( #24691 ) add conv_3d optimize update ops.md restore test script rm unused code rm copyright notes macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…
15 -
llama.cpp releases dev-tools 14d ago
b9682
vulkan: record actual memory properties during buffer creation ( #24326 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
9 -
llama.cpp releases dev-tools 14d ago
b9678
opencl: optimize mul_mat_f16_f32_l4 for decode ( #24504 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
4 -
llama.cpp releases dev-tools 15d ago
b9677
common: update logging to enforce max_capacity and optimize queue resizing ( #24490 ) common: update logging to enforce max_capacity and optimize queue resizing logic common/log: remove queue expansion logic macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…
35 -
llama.cpp releases dev-tools 15d ago
b9675
sycl : Enable to support fp16 by OPs: SQR, SQRT, LOG, SIN, COS, CLAMP ( #24692 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…
33 -
llama.cpp releases dev-tools 15d ago
b9674
SYCL: fix use-after-free bug with async memcpy in MoE prefill ( #24676 ) SYCL: fix a bug with async memcpy make mmid_row_mapping_host persistent comment on stream->wait Apply suggestion from @sanmai Apply suggestion from @sanmai Apply suggestion from @sanmai macOS/iOS: macOS…
34 -
llama.cpp releases dev-tools 15d ago
b9673
sycl: Add optional USM system allocations ( #22526 ) This introduces an optional feature to allocate large GPU buffers (≥ 1GB) using USM system allocations if supported by the device. It allows using buffers from the system allocator then letting the system manage memory…
18 -
llama.cpp releases dev-tools 15d ago
b9672
vendor : update BoringSSL to 0.20260616.0 ( #24693 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
31 -
llama.cpp releases dev-tools 15d ago
b9670
Fix and restrict NVFP4 edge-cases in llama-graph ( #24331 ) Move post-GEMM MUL required for dequant b4 lora and bias add see #23484 : For lora, I would presume we want fully dequantized values before doing the residuals, but this depends on how the LORAs were generated.…
26 -
llama.cpp releases dev-tools 15d ago
b9669
spec: add backend sampling support for eagle3 ( #24655 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
27 -
llama.cpp releases dev-tools 16d ago
b9668
vulkan: prefer host-visible memory buffers on UMA devices ( #22930 ) implement UMA host-visible memory update based on 0cc4m's suggestion macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu…
37 -
llama.cpp releases dev-tools 16d ago
b9667
vulkan: Support gated_delta_net with S_v=16 ( #24581 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
38 -
llama.cpp releases dev-tools 16d ago
b9665
bench : add --offline ( #24511 ) bench : add --offline Signed-off-by: Adrien Gallouët angt@huggingface.co Add default Signed-off-by: Adrien Gallouët angt@huggingface.co Signed-off-by: Adrien Gallouët angt@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…
29 -
llama.cpp releases dev-tools 16d ago
b9663
[SYCL] Support OP EXPM1, support all UT cases of FLOOR, TRUNC, ROUND ( #24363 ) support OP EXPM1, support all UT cases of FLOOR, TRUNC, ROUND fix conflict rebase, support new UT case of repeat, concat macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…
17 -
llama.cpp releases dev-tools 16d ago
b9664: sycl: support reordered Q4_K/Q5_K/Q6_K MoE MUL_MAT_ID (#24452)
sycl: support reordered Q4_K and Q5_K MoE MUL_MAT_ID Extend reordered-weight handling to fused MoE MUL_MAT_ID for Q4_K and Q5_K expert tensors and add Q5_K reordered DMMV coverage. Unsupported 3D reorder cases now fall back instead of aborting. sycl: extend MoE reorder to Q6_K…
21 -
llama.cpp releases dev-tools 16d ago
b9661
vulkan: add col2im_1d op ( #24425 ) vulkan: add GGML_OP_COL2IM_1D, follow-up to the CPU op vulkan: col2im_1d bounded gather loop instead of full-K scan with modulo vulkan: col2im_1d address review from @jeffbolznv vulkan: col2im_1d return nullptr for unsupported types, address…
20 -
llama.cpp releases dev-tools 16d ago
b9660
chat : fix LFM2 tool-call parsing double-escaping ( #24667 ) Add escape test cases chat : fix LFM2 tool-call parsing double-escaping macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64…
20 -
llama.cpp releases dev-tools 16d ago
b9659
mtmd: fix miscounting n_tokens ( #24656 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
20 -
llama.cpp releases dev-tools 16d ago
b9658
chat: include full unparsed prompt in debug ( #24650 ) message on parse error macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
5 -
llama.cpp releases dev-tools 16d ago
b9656
chat: harden peg-native tool call parsing ( #24329 ) chat: harden peg-native tool call parsing accept an optional leading type: function field in build_json_tools_flat_keys so openai style tool calls parse on templates whose serialization opens on the name field. return a clean…
27