llama.cpp releases
468 articles archived · Visit source ↗ · RSS
-
llama.cpp releases dev-tools 7d ago
b9782
common: remove unused json-partial ( #24968 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
5 -
llama.cpp releases dev-tools 7d ago
b9781
vulkan: allow reducing the graph submission batches to avoid timeouts ( #24872 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…
7 -
llama.cpp releases dev-tools 7d ago
b9780
vulkan: fail the build when a shader fails to compile ( #24450 ) vulkan-shaders-gen: fail the build when a shader fails to compile vulkan-shaders-gen did not detect shader-compile subprocess failures, so a broken libggml-vulkan could be produced while the build reported success…
18 -
llama.cpp releases dev-tools 8d ago
b9777
model : Add LFM2.5-ColBERT-350M and LFM2.5-Embedding-350M ( #24913 ) model : Add LFM2.5-ColBERT-350M and LFM2.5-Embedding-350M Restore LFM2 models in README.md macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS…
7 -
llama.cpp releases dev-tools 8d ago
b9776
vulkan: Apply bias before softmax in FA, to avoid overflow ( #24909 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
26 -
llama.cpp releases dev-tools 8d ago
b9775
server : check draft context creation error ( #24922 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
13 -
llama.cpp releases dev-tools 8d ago
b9774
vulkan: support all backend tests for SQR/SQRT/SIN/COS/CLAMP/LEAKY_RELU/NORM ( #24582 ) vulkan: make SQR/SQRT/SIN/COS/CLAMP/LEAKY_RELU use unary.comp vulkan: make NORM support noncontig add noncontiguous row test cases for norm/l2_norm, handle this in the CPU backend and…
31 -
llama.cpp releases dev-tools 8d ago
b9773
vulkan: Support GET_ROWS_BACK ( #24883 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…
23 -
llama.cpp releases dev-tools 8d ago
b9771
vulkan: make mul_mm ALIGNED a spec constant ( #24689 ) This trims down some of the shader variant explosion and reduces binary size. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64…
26 -
llama.cpp releases dev-tools 8d ago
b9770
server: fix remote preset handling, add test ( #24938 ) server: add test for remote preset fix remote preset handling fix fix test macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64…
20 -
llama.cpp releases dev-tools 8d ago
b9769
vulkan: link ggml-cpu when GGML_VULKAN_CHECK_RESULTS / RUN_TESTS are enabled ( #24444 ) The result-checking and test debug paths in ggml-vulkan.cpp call ggml_graph_compute_with_ctx() to compute a CPU reference graph, but that symbol is defined in ggml-cpu, which ggml-vulkan does…
37 -
llama.cpp releases dev-tools 8d ago
b9768
model: Granite Speech Plus ( #24818 ) feat: Add conversion support for Granite Speech Plus Branch: GraniteSpeechPlus AI-usage: full (Bob, OpenCode + Qwen3.6-35b) Signed-off-by: Gabe Goodhart ghart@us.ibm.com feat: Extend granite_speech to support plus multi-layer concatenation…
27 -
llama.cpp releases dev-tools 9d ago
b9767
ggml-webgpu: improve MTP inference by using mat-vec path for small batches ( #24811 ) ggml-webgpu: improve small batches decoding Add barrier to the NUM_COLS loop in mul-mat-vec macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS…
21 -
llama.cpp releases dev-tools 9d ago
b9765
server: improve user message detection and create checkpoints at ever…
20 -
llama.cpp releases dev-tools 9d ago
b9763
server : Add id to tool call responses api ( #24882 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
32 -
llama.cpp releases dev-tools 9d ago
b9761
server: (router) move model downloading to dedicated process ( #24834 ) server: real-time model load progress tracking via /models/sse update docs server: move model download to child process rm unused fix most problems clean up nit fixes fix test case do not detact() thread…
8 -
llama.cpp releases dev-tools 9d ago
b9760
server: refactor/generalize input file schema ( #24299 ) server: refactor/generalize input file schema wire up input_video, accept raw base64 nits nits (2) fix windows macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64)…
36 -
llama.cpp releases dev-tools 9d ago
b9758
[SYCL] support bf16 on bin_bcast OP and unary OPs ( #24838 ) support bf16 on bin_bcast OP and unary OPs support the older Intel compiler than 2026.0 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework…
23 -
llama.cpp releases dev-tools 9d ago
b9757
sampling : remove unconditional softmax+sort in top-n-sigma sampler ( #22645 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
13 -
llama.cpp releases dev-tools 10d ago
b9756
server: fix edit_file crash on append at end of file (line_start -1) ( #24893 ) line_start -1 normalized to n+1, so append inserted at lines.begin() + n + 1, one past end() -> heap-buffer-overflow in vector::_M_range_insert. Normalize -1 to n (insert at end()), restrict -1 to…
24 -
llama.cpp releases dev-tools 10d ago
b9755
docs/android.md: Add dependency libandroid-spawn for building in te…
7 -
llama.cpp releases dev-tools 10d ago
b9754
common/peg : implement ac parser for stricter grammar generation ( #24869 ) common/peg : implement ac parser cont : extract functions cont : tidy up cont : remove a test cont : move ac() def macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…
33 -
llama.cpp releases dev-tools 10d ago
b9753
server: fix report progress for loading spec models, add "stages" list ( #24870 ) server: fix report progress for loading spec models, add "stages" list improve nits nits 2 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel…
28 -
llama.cpp releases dev-tools 10d ago
b9752
server: refactor batch construction ( #24843 ) server: refactor batch construction wip wip 2 wip 3 wip 4 add abort_all_slots handle batch full more carefully fix assert rm debug log small nits (debug) add timings debug: force llama_synchronize for accurate timings address…
5 -
llama.cpp releases dev-tools 10d ago
b9751
mtmd: fix mtmd_get_memory_usage ( #24867 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
13 -
llama.cpp releases dev-tools 10d ago
b9750
jinja : implement call statement ( #24847 ) implement call statement undo unintended change de-lambda simplify move caller context inside function handler macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS…
9 -
llama.cpp releases dev-tools 10d ago
b9748
server: add "verbose" field to schema ( #24864 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
18 -
llama.cpp releases dev-tools 10d ago
b9747
server: real-time model load progress tracking via /models/sse ( #24828 ) server: real-time model load progress tracking via /models/sse update docs add mutex for notify_to_router correct docs macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…
28 -
llama.cpp releases dev-tools 10d ago
b9745
spec : Support Step3.5/3.7 flash mtp3 ( #24340 ) add mtp_layer_offset + include nextn flags in graph reuse add llama_set_mtp_layer_offset + llama_model_n_nextn_layer API offset head select + require all MTP blocks speculative multi-head process() speculative multi-head draft()…
6 -
llama.cpp releases dev-tools 11d ago
b9744
common/peg : refactor until gbnf grammar generation ( #24839 ) common/peg : refactor until gbnf grammar into an ac automaton cont : add a test with multiple strings cont : pad state with 0s so rules line up cont : clean up comments cont : use set everywhere cont : inline state…
4 -
llama.cpp releases dev-tools 11d ago
b9743
common/json-schema-to-grammar : align spacing rules with parsers ( #24835 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
11 -
llama.cpp releases dev-tools 11d ago
b9742
fix(hexagon): use padded stride for ssm-conv weights ( #24470 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
4 -
llama.cpp releases dev-tools 11d ago
b9741
llama : use LLM_KV for quantization_version & file_type ( #24802 ) Signed-off-by: Adrien Gallouët angt@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…
27 -
llama.cpp releases dev-tools 11d ago
b9740
arg: try fixing test-args-parser randomly fails ( #24826 ) arg: try fixing test-args-parser randomly fails return ref try triggering the workflow exception wrapper wip test test 2 arg: guard win32 utf8 argv override make_utf8_argv rebuilds argv from GetCommandLineW to fix utf8…
8 -
llama.cpp releases dev-tools 11d ago
b9739
release: add missing link for win opencl adreno arm64 ( #24809 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
33 -
llama.cpp releases dev-tools 11d ago
b9738
server: avoid forwarding auth headers in CORS proxy ( #24373 ) server: avoid forwarding auth headers in CORS proxy format fix test fix e2e test Co-authored-by: Xuan Son Nguyen son@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…
19 -
llama.cpp releases dev-tools 11d ago
b9736
model : glm-dsa load DSA indexer tensors as optional ( #24770 ) GLM-5.2 ships the DSA "lightning indexer" on only a subset of layers (the "full" layers; others omit it), but the GLM_DSA loader created the five indexer tensors on every layer as required, so loading any GLM-5.2…
14 -
llama.cpp releases dev-tools 11d ago
b9737
docker : prebuild web UI for s390x build [no release] ( #24829 )
31 -
llama.cpp releases dev-tools 12d ago
b9733
ggml-webgpu: add adapter toggles for F16 on Vulkan + NVIDIA macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
11 -
llama.cpp releases dev-tools 12d ago
b9732
server: refactor child --> router communication ( #24821 ) server: refactor child --> router communication fix wakeup case add docs improve update_status() nits macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS…
13 -
llama.cpp releases dev-tools 12d ago
b9731
server : optimize get_token_probabilities ( #24796 ) Use std::partial_sort to order only the requested top-n tokens instead of the full vocabulary logprobs sort: vocab=128000 n_top=0 iters=100 full sort: 8555.6 us/op partial sort: 704.3 us/op Signed-off-by: Adrien Gallouët…
37 -
llama.cpp releases dev-tools 12d ago
b9730
mtmd, arg: fix utf8 handling on windows ( #24779 ) mtmd, arg: fix utf8 handling on windows also fix ggml_fopen fix build fail also fix CLI macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux:…
36 -
llama.cpp releases dev-tools 12d ago
b9729
server: remove all internal mentions about "webui" ( #24817 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
32 -
llama.cpp releases dev-tools 12d ago
b9728
arg: Add comment line support to --api-key-file ( #23168 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
25 -
llama.cpp releases dev-tools 12d ago
b9727
vendor : update cpp-httplib to 0.48.0 ( #24787 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
38 -
llama.cpp releases dev-tools 12d ago
b9726
server: add --agent arg, remove redundant webui naming compat ( #24801 ) server: add --agent arg, remove redundant webui naming compat corrent env fix the test llama-gen-docs nits: wordings macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…
10 -
llama.cpp releases dev-tools 12d ago
b9725: docker : build the UI (#24794)
docker : build the UI cont : use existing APP_VERSION
5 -
llama.cpp releases dev-tools 12d ago
b9724
mtmd: several bug fixes ( #24784 ) mtmd: several bug fixes fix build fix gemma4ua add sanity check in get_u32() fix build (2) area() avoid overflow macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework…
27 -
llama.cpp releases dev-tools 12d ago
b9723
spec: support eagle3 for qwen3.5 & 3.6 ( #24593 ) spec: support qwen3.5 & 3.6 eagle3 draft eagle3: Add deferred boundary checkpoints restore support for hybrid models apply suggestions Co-authored-by: Georgi Gerganov ggerganov@gmail.com spec: adapt to API change spec: fix naming…
21