Home Status News MCP Pricing Sign in

Home Status News MCP Pricing Sign in

News / llama.cpp releases

llama.cpp releases

468 articles archived · Visit source ↗ · RSS

Sign in to subscribe

llama.cpp releases dev-tools 7d ago

b9782

common: remove unused json-partial ( #24968 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

5
llama.cpp releases dev-tools 7d ago

b9781

vulkan: allow reducing the graph submission batches to avoid timeouts ( #24872 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…

7
llama.cpp releases dev-tools 7d ago

b9780

vulkan: fail the build when a shader fails to compile ( #24450 ) vulkan-shaders-gen: fail the build when a shader fails to compile vulkan-shaders-gen did not detect shader-compile subprocess failures, so a broken libggml-vulkan could be produced while the build reported success…

18
llama.cpp releases dev-tools 8d ago

b9777

model : Add LFM2.5-ColBERT-350M and LFM2.5-Embedding-350M ( #24913 ) model : Add LFM2.5-ColBERT-350M and LFM2.5-Embedding-350M Restore LFM2 models in README.md macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS…

7
llama.cpp releases dev-tools 8d ago

b9776

vulkan: Apply bias before softmax in FA, to avoid overflow ( #24909 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

26
llama.cpp releases dev-tools 8d ago

b9775

server : check draft context creation error ( #24922 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

13
llama.cpp releases dev-tools 8d ago

b9774

vulkan: support all backend tests for SQR/SQRT/SIN/COS/CLAMP/LEAKY_RELU/NORM ( #24582 ) vulkan: make SQR/SQRT/SIN/COS/CLAMP/LEAKY_RELU use unary.comp vulkan: make NORM support noncontig add noncontiguous row test cases for norm/l2_norm, handle this in the CPU backend and…

31
llama.cpp releases dev-tools 8d ago

b9773

vulkan: Support GET_ROWS_BACK ( #24883 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…

23
llama.cpp releases dev-tools 8d ago

b9771

vulkan: make mul_mm ALIGNED a spec constant ( #24689 ) This trims down some of the shader variant explosion and reduces binary size. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64…

26
llama.cpp releases dev-tools 8d ago

b9770

server: fix remote preset handling, add test ( #24938 ) server: add test for remote preset fix remote preset handling fix fix test macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64…

20
llama.cpp releases dev-tools 8d ago

b9769

vulkan: link ggml-cpu when GGML_VULKAN_CHECK_RESULTS / RUN_TESTS are enabled ( #24444 ) The result-checking and test debug paths in ggml-vulkan.cpp call ggml_graph_compute_with_ctx() to compute a CPU reference graph, but that symbol is defined in ggml-cpu, which ggml-vulkan does…

37
llama.cpp releases dev-tools 8d ago

b9768

model: Granite Speech Plus ( #24818 ) feat: Add conversion support for Granite Speech Plus Branch: GraniteSpeechPlus AI-usage: full (Bob, OpenCode + Qwen3.6-35b) Signed-off-by: Gabe Goodhart ghart@us.ibm.com feat: Extend granite_speech to support plus multi-layer concatenation…

27
llama.cpp releases dev-tools 9d ago

b9767

ggml-webgpu: improve MTP inference by using mat-vec path for small batches ( #24811 ) ggml-webgpu: improve small batches decoding Add barrier to the NUM_COLS loop in mul-mat-vec macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS…

21
llama.cpp releases dev-tools 9d ago

b9765

server: improve user message detection and create checkpoints at ever…

20
llama.cpp releases dev-tools 9d ago

b9763

server : Add id to tool call responses api ( #24882 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

32
llama.cpp releases dev-tools 9d ago

b9761

server: (router) move model downloading to dedicated process ( #24834 ) server: real-time model load progress tracking via /models/sse update docs server: move model download to child process rm unused fix most problems clean up nit fixes fix test case do not detact() thread…

8
llama.cpp releases dev-tools 9d ago

b9760

server: refactor/generalize input file schema ( #24299 ) server: refactor/generalize input file schema wire up input_video, accept raw base64 nits nits (2) fix windows macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64)…

36
llama.cpp releases dev-tools 9d ago

b9758

[SYCL] support bf16 on bin_bcast OP and unary OPs ( #24838 ) support bf16 on bin_bcast OP and unary OPs support the older Intel compiler than 2026.0 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework…

23
llama.cpp releases dev-tools 9d ago

b9757

sampling : remove unconditional softmax+sort in top-n-sigma sampler ( #22645 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

13
llama.cpp releases dev-tools 10d ago

b9756

server: fix edit_file crash on append at end of file (line_start -1) ( #24893 ) line_start -1 normalized to n+1, so append inserted at lines.begin() + n + 1, one past end() -> heap-buffer-overflow in vector::_M_range_insert. Normalize -1 to n (insert at end()), restrict -1 to…

24
llama.cpp releases dev-tools 10d ago

b9755

docs/android.md: Add dependency libandroid-spawn for building in te…

7
llama.cpp releases dev-tools 10d ago

b9754

common/peg : implement ac parser for stricter grammar generation ( #24869 ) common/peg : implement ac parser cont : extract functions cont : tidy up cont : remove a test cont : move ac() def macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…

33
llama.cpp releases dev-tools 10d ago

b9753

server: fix report progress for loading spec models, add "stages" list ( #24870 ) server: fix report progress for loading spec models, add "stages" list improve nits nits 2 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel…

28
llama.cpp releases dev-tools 10d ago

b9752

server: refactor batch construction ( #24843 ) server: refactor batch construction wip wip 2 wip 3 wip 4 add abort_all_slots handle batch full more carefully fix assert rm debug log small nits (debug) add timings debug: force llama_synchronize for accurate timings address…

5
llama.cpp releases dev-tools 10d ago

b9751

mtmd: fix mtmd_get_memory_usage ( #24867 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

13
llama.cpp releases dev-tools 10d ago

b9750

jinja : implement call statement ( #24847 ) implement call statement undo unintended change de-lambda simplify move caller context inside function handler macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS…

9
llama.cpp releases dev-tools 10d ago

b9748

server: add "verbose" field to schema ( #24864 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

18
llama.cpp releases dev-tools 10d ago

b9747

server: real-time model load progress tracking via /models/sse ( #24828 ) server: real-time model load progress tracking via /models/sse update docs add mutex for notify_to_router correct docs macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…

28
llama.cpp releases dev-tools 10d ago

b9745

spec : Support Step3.5/3.7 flash mtp3 ( #24340 ) add mtp_layer_offset + include nextn flags in graph reuse add llama_set_mtp_layer_offset + llama_model_n_nextn_layer API offset head select + require all MTP blocks speculative multi-head process() speculative multi-head draft()…

6
llama.cpp releases dev-tools 11d ago

b9744

common/peg : refactor until gbnf grammar generation ( #24839 ) common/peg : refactor until gbnf grammar into an ac automaton cont : add a test with multiple strings cont : pad state with 0s so rules line up cont : clean up comments cont : use set everywhere cont : inline state…

4
llama.cpp releases dev-tools 11d ago

b9743

common/json-schema-to-grammar : align spacing rules with parsers ( #24835 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

11
llama.cpp releases dev-tools 11d ago

b9742

fix(hexagon): use padded stride for ssm-conv weights ( #24470 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

4
llama.cpp releases dev-tools 11d ago

b9741

llama : use LLM_KV for quantization_version & file_type ( #24802 ) Signed-off-by: Adrien Gallouët angt@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…

27
llama.cpp releases dev-tools 11d ago

b9740

arg: try fixing test-args-parser randomly fails ( #24826 ) arg: try fixing test-args-parser randomly fails return ref try triggering the workflow exception wrapper wip test test 2 arg: guard win32 utf8 argv override make_utf8_argv rebuilds argv from GetCommandLineW to fix utf8…

8
llama.cpp releases dev-tools 11d ago

b9739

release: add missing link for win opencl adreno arm64 ( #24809 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

33
llama.cpp releases dev-tools 11d ago

b9738

server: avoid forwarding auth headers in CORS proxy ( #24373 ) server: avoid forwarding auth headers in CORS proxy format fix test fix e2e test Co-authored-by: Xuan Son Nguyen son@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…

19
llama.cpp releases dev-tools 11d ago

b9736

model : glm-dsa load DSA indexer tensors as optional ( #24770 ) GLM-5.2 ships the DSA "lightning indexer" on only a subset of layers (the "full" layers; others omit it), but the GLM_DSA loader created the five indexer tensors on every layer as required, so loading any GLM-5.2…

14
llama.cpp releases dev-tools 11d ago

b9735

ggml : optimize AMX ( #24806 ) Flatten the partition over n_batch * M so every thread participates in the quantization | CPU | Model | Test | t/s OLD | t/s NEW | Speedup |…

37
llama.cpp releases dev-tools 11d ago

b9737

docker : prebuild web UI for s390x build [no release] ( #24829 )

31
llama.cpp releases dev-tools 12d ago

b9733

ggml-webgpu: add adapter toggles for F16 on Vulkan + NVIDIA macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

11
llama.cpp releases dev-tools 12d ago

b9732

server: refactor child --> router communication ( #24821 ) server: refactor child --> router communication fix wakeup case add docs improve update_status() nits macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS…

13
llama.cpp releases dev-tools 12d ago

b9731

server : optimize get_token_probabilities ( #24796 ) Use std::partial_sort to order only the requested top-n tokens instead of the full vocabulary logprobs sort: vocab=128000 n_top=0 iters=100 full sort: 8555.6 us/op partial sort: 704.3 us/op Signed-off-by: Adrien Gallouët…

37
llama.cpp releases dev-tools 12d ago

b9730

mtmd, arg: fix utf8 handling on windows ( #24779 ) mtmd, arg: fix utf8 handling on windows also fix ggml_fopen fix build fail also fix CLI macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux:…

36
llama.cpp releases dev-tools 12d ago

b9729

server: remove all internal mentions about "webui" ( #24817 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

32
llama.cpp releases dev-tools 12d ago

b9728

arg: Add comment line support to --api-key-file ( #23168 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

25
llama.cpp releases dev-tools 12d ago

b9727

vendor : update cpp-httplib to 0.48.0 ( #24787 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

38
llama.cpp releases dev-tools 12d ago

b9726

server: add --agent arg, remove redundant webui naming compat ( #24801 ) server: add --agent arg, remove redundant webui naming compat corrent env fix the test llama-gen-docs nits: wordings macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…

10
llama.cpp releases dev-tools 12d ago

b9725: docker : build the UI (#24794)

docker : build the UI cont : use existing APP_VERSION

5
llama.cpp releases dev-tools 12d ago

b9724

mtmd: several bug fixes ( #24784 ) mtmd: several bug fixes fix build fix gemma4ua add sanity check in get_u32() fix build (2) area() avoid overflow macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework…

27
llama.cpp releases dev-tools 12d ago

b9723

spec: support eagle3 for qwen3.5 & 3.6 ( #24593 ) spec: support qwen3.5 & 3.6 eagle3 draft eagle3: Add deferred boundary checkpoints restore support for hybrid models apply suggestions Co-authored-by: Georgi Gerganov ggerganov@gmail.com spec: adapt to API change spec: fix naming…

21

Page 2 of 10 · 468 articles ← Newer Older →

Product

Pricing
Roadmap
Changelog
Incidents

Resources

News RSS
MCP RSS
MCP releases RSS
Incidents RSS
Changelog RSS

Project

About
API
Contact

Legal

Privacy
Terms
Security

Prismix · © 2026 · AI Hub

All product names and logos are trademarks of their respective owners.

Send feedback

Name (optional)

Email *

Message *