Home Status News MCP Pricing Sign in

Home Status News MCP Pricing Sign in

News / llama.cpp releases

llama.cpp releases

468 articles archived · Visit source ↗ · RSS

Sign in to subscribe

llama.cpp releases dev-tools 13d ago

b9722

server: fix non-bound n_discard value (ctx shifting) ( #24786 ) server: fix non-bound n_discard value Update tools/server/server-context.cpp Co-authored-by: Georgi Gerganov ggerganov@gmail.com Co-authored-by: Georgi Gerganov ggerganov@gmail.com macOS/iOS: macOS Apple Silicon…

36
llama.cpp releases dev-tools 13d ago

b9721

sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64…

31
llama.cpp releases dev-tools 13d ago

b9718

server : consolidate slot selection into get_available_slot ( #24755 ) Absorb get_slot_by_id logic into get_available_slot so slot selection is handled by a single function call. When a specific slot id is requested, the LCP similarity check still runs to enable proper prompt…

25
llama.cpp releases dev-tools 13d ago

b9717

ggml-cpu: support K tails in power10 Q8/Q4 MMA matmul ( #24753 ) ggml-cpu: support K tails in Power10 MMA Q8/Q4 matmul This patch removes the requirement that K be divisible by kc in the tinyBlas_Q0_PPC tiled matmul path. Process the final K panel using its actual depth and pass…

38
llama.cpp releases dev-tools 13d ago

b9716

mtmd: add batching support for internvl ( #24775 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

16
llama.cpp releases dev-tools 13d ago

b9715

Ggml/cuda col2im 1d ( #24417 ) cuda: add GGML_OP_COL2IM_1D, follow-up to the CPU op cuda: col2im_1d use fast_div_modulo for the index decomposition cuda: col2im_1d tighten supports_op, type match and contiguous dst macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…

30
llama.cpp releases dev-tools 13d ago

b9714

server: add "X-Accel-Buffering": "no" header to streaming endpoints ( #24774 ) server: add "X-Accel-Buffering": "no" header to streaming endpoints This header tells Nginx (as a reverse proxy) to NOT buffer responses. (only affects streaming endpoints) Without it, Nginx will…

11
llama.cpp releases dev-tools 13d ago

b9713

mtmd: add batching for mtmd-cli, add video tests ( #24778 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

22
llama.cpp releases dev-tools 13d ago

b9712

cmake : fix ui build with read-only source ( #24752 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

4
llama.cpp releases dev-tools 13d ago

b9711

mtmd: refactor llava-uhd overview image handling (always use ov_img_first) ( #24769 ) add dedicated "overview" for mtmd_image_preproc_out corrections correct (again) nits nits (2) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS…

14
llama.cpp releases dev-tools 13d ago

b9707

server: add "schema" and validation ( #24150 ) wip working correct some limits add field name to error message macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64…

5
llama.cpp releases dev-tools 13d ago

b9704

server : return HTTP 400 on invalid grammar ( #24144 ) ( #24154 ) Throw on grammar parse failure so the server returns HTTP 400 instead of silently dropping the constraint. Add a regression test for the invalid-grammar response. Fixes #24144 macOS/iOS: macOS Apple Silicon…

26
llama.cpp releases dev-tools 13d ago

b9703

server: (router) rework -hf preset repo ( #24739 ) server: temporary remove HF remote preset rework remove preset.ini support rm unused get_remote_preset_whitelist() print warning add docs rm stray file macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…

19
llama.cpp releases dev-tools 13d ago

b9702

server: fix router args not being forwarded to child instances ( #24760 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

14
llama.cpp releases dev-tools 13d ago

b9701

mtmd: refactor preprocessor, add mtmd_image_preproc_out ( #24736 ) add mtmd_image_preproc_out add dev docs remove unused clip API rm unused clip_image_f32_batch::grid change preprocess() call signature macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…

15
llama.cpp releases dev-tools 14d ago

b9700

[SYCL] rename GGML_SYCL_SUPPORT_LEVEL_ZERO ( #24719 ) rename GGML_SYCL_SUPPORT_LEVEL_ZERO to GGML_SYCL_SUPPORT_LEVEL_ZERO_API, and GGML_SYCL_ENABLE_LEVEL_ZERO to GGML_SYCL_USE_LEVEL_ZERO_API fix code format fix error when rebase macOS/iOS: macOS Apple Silicon (arm64) macOS Apple…

31
llama.cpp releases dev-tools 14d ago

b9699

sycl : support MUL_MAT and OUT_PROD with Q1_0 ( #24721 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

31
llama.cpp releases dev-tools 14d ago

b9698

app : enable self-update only when built with llama-install.sh ( #24754 ) Signed-off-by: Adrien Gallouët angt@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…

34
llama.cpp releases dev-tools 14d ago

b9697

ci : fix check-release message parsing ( #24751 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

25
llama.cpp releases dev-tools 14d ago

b9694

ci : fix Windows x64 (OpenVINO) release link ( #24731 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

28
llama.cpp releases dev-tools 14d ago

b9693

metal : check for BF16 support in concat kernel ( #24747 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

16
llama.cpp releases dev-tools 14d ago

b9692

mtmd: llava_uhd should no longer use batch dim ( #24732 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

32
llama.cpp releases dev-tools 14d ago

b9691

ggml-cpu: Conditionally enable power11 backend based on compiler support ( #24687 ) ggml: Conditionally enable power11 backend based on compiler support Guard POWER11 backend creation behind a compiler flag check for -mcpu=power11. This avoids build failures on current GCC/Clang…

14
llama.cpp releases dev-tools 14d ago

b9690

metal : implement rope_back operator ( #24725 ) Reuse existing rope kernels with a function constant to toggle forward/backward rotation, avoiding duplicate kernel code. Assisted-by: pi:llama.cpp/Qwen3.6-27B macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…

27
llama.cpp releases dev-tools 14d ago

b9689

metal : add f16 and bf16 support for concat operator ( #24724 ) metal : add f16 and bf16 support for concat operator Extend the Metal backend concat operator to support f16 and bf16 tensor types in addition to the existing f32 and i32 support. Template kernel_concat on type T…

34
llama.cpp releases dev-tools 14d ago

b9688

server: (router) add model management API ( #23976 ) wip server: (router) add SSE realtime updates API nits wip add download API add download api update docs add delete endpoint fix std::terminate fix crash fix 2 add tests nits macOS/iOS: macOS Apple Silicon (arm64) macOS Apple…

17
llama.cpp releases dev-tools 14d ago

b9687

llama : skip main_gpu validation when no devices are available ( #23405 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

11
llama.cpp releases dev-tools 14d ago

b9686

spec: fix segfault error on long prompts for eagle3 ( #24707 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

17
llama.cpp releases dev-tools 14d ago

b9685

[SYCL] add dev2dev memcpy by SYCL API ( #24476 ) add dev2dev memcpy by SYCL API mv GGML_SYCL_DEV2DEV_MEMCPY to runntime table update the detect method for p2p comm fix the erro created during fix confilct Co-authored-by: Neo Zhang macOS/iOS: macOS Apple Silicon (arm64) macOS…

33
llama.cpp releases dev-tools 14d ago

b9684

[SYCL] Add conv_3d ( #24691 ) add conv_3d optimize update ops.md restore test script rm unused code rm copyright notes macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…

15
llama.cpp releases dev-tools 14d ago

b9682

vulkan: record actual memory properties during buffer creation ( #24326 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

9
llama.cpp releases dev-tools 14d ago

b9678

opencl: optimize mul_mat_f16_f32_l4 for decode ( #24504 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

4
llama.cpp releases dev-tools 15d ago

b9677

common: update logging to enforce max_capacity and optimize queue resizing ( #24490 ) common: update logging to enforce max_capacity and optimize queue resizing logic common/log: remove queue expansion logic macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…

35
llama.cpp releases dev-tools 15d ago

b9675

sycl : Enable to support fp16 by OPs: SQR, SQRT, LOG, SIN, COS, CLAMP ( #24692 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…

33
llama.cpp releases dev-tools 15d ago

b9674

SYCL: fix use-after-free bug with async memcpy in MoE prefill ( #24676 ) SYCL: fix a bug with async memcpy make mmid_row_mapping_host persistent comment on stream->wait Apply suggestion from @sanmai Apply suggestion from @sanmai Apply suggestion from @sanmai macOS/iOS: macOS…

34
llama.cpp releases dev-tools 15d ago

b9680: ci: fix vulkan docker images (#24595)

Update vulkan-shaders-gen.cpp Update vulkan-shaders-gen.cpp add comment describing code change intention Update vulkan-shaders-gen.cpp fix potential UB

24
llama.cpp releases dev-tools 15d ago

b9673

sycl: Add optional USM system allocations ( #22526 ) This introduces an optional feature to allocate large GPU buffers (≥ 1GB) using USM system allocations if supported by the device. It allows using buffers from the system allocator then letting the system manage memory…

18
llama.cpp releases dev-tools 15d ago

b9672

vendor : update BoringSSL to 0.20260616.0 ( #24693 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

31
llama.cpp releases dev-tools 15d ago

b9670

Fix and restrict NVFP4 edge-cases in llama-graph ( #24331 ) Move post-GEMM MUL required for dequant b4 lora and bias add see #23484 : For lora, I would presume we want fully dequantized values before doing the residuals, but this depends on how the LORAs were generated.…

26
llama.cpp releases dev-tools 15d ago

b9669

spec: add backend sampling support for eagle3 ( #24655 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

27
llama.cpp releases dev-tools 16d ago

b9668

vulkan: prefer host-visible memory buffers on UMA devices ( #22930 ) implement UMA host-visible memory update based on 0cc4m's suggestion macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu…

37
llama.cpp releases dev-tools 16d ago

b9667

vulkan: Support gated_delta_net with S_v=16 ( #24581 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

38
llama.cpp releases dev-tools 16d ago

b9665

bench : add --offline ( #24511 ) bench : add --offline Signed-off-by: Adrien Gallouët angt@huggingface.co Add default Signed-off-by: Adrien Gallouët angt@huggingface.co Signed-off-by: Adrien Gallouët angt@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…

29
llama.cpp releases dev-tools 16d ago

b9663

[SYCL] Support OP EXPM1, support all UT cases of FLOOR, TRUNC, ROUND ( #24363 ) support OP EXPM1, support all UT cases of FLOOR, TRUNC, ROUND fix conflict rebase, support new UT case of repeat, concat macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…

17
llama.cpp releases dev-tools 16d ago

b9664: sycl: support reordered Q4_K/Q5_K/Q6_K MoE MUL_MAT_ID (#24452)

sycl: support reordered Q4_K and Q5_K MoE MUL_MAT_ID Extend reordered-weight handling to fused MoE MUL_MAT_ID for Q4_K and Q5_K expert tensors and add Q5_K reordered DMMV coverage. Unsupported 3D reorder cases now fall back instead of aborting. sycl: extend MoE reorder to Q6_K…

21
llama.cpp releases dev-tools 16d ago

b9661

vulkan: add col2im_1d op ( #24425 ) vulkan: add GGML_OP_COL2IM_1D, follow-up to the CPU op vulkan: col2im_1d bounded gather loop instead of full-K scan with modulo vulkan: col2im_1d address review from @jeffbolznv vulkan: col2im_1d return nullptr for unsupported types, address…

20
llama.cpp releases dev-tools 16d ago

b9660

chat : fix LFM2 tool-call parsing double-escaping ( #24667 ) Add escape test cases chat : fix LFM2 tool-call parsing double-escaping macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64…

20
llama.cpp releases dev-tools 16d ago

b9659

mtmd: fix miscounting n_tokens ( #24656 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

20
llama.cpp releases dev-tools 16d ago

b9658

chat: include full unparsed prompt in debug ( #24650 ) message on parse error macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

5
llama.cpp releases dev-tools 16d ago

b9656

chat: harden peg-native tool call parsing ( #24329 ) chat: harden peg-native tool call parsing accept an optional leading type: function field in build_json_tools_flat_keys so openai style tool calls parse on templates whose serialization opens on the name field. return a clean…

27

Page 3 of 10 · 468 articles ← Newer Older →

Product

Pricing
Roadmap
Changelog
Incidents

Resources

News RSS
MCP RSS
MCP releases RSS
Incidents RSS
Changelog RSS

Project

About
API
Contact

Legal

Privacy
Terms
Security

Prismix · © 2026 · AI Hub

All product names and logos are trademarks of their respective owners.

Send feedback

Name (optional)

Email *

Message *