Home Status News MCP Pricing Sign in

Home Status News MCP Pricing Sign in

News / llama.cpp releases

llama.cpp releases

469 articles archived · Visit source ↗ · RSS

Sign in to subscribe

llama.cpp releases dev-tools 1mo ago

b9353

server : fix the log message when using SSL ( #23393 ) When llama-server is started with SSL key and cert, the log says that it listens on http instead of https. This patch fixes this. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS…

7
llama.cpp releases dev-tools 1mo ago

b9352

ggml-zendnn : fixed naming of matmul function ( #20964 ) ggml-zendnn: fixed naming of matmul function ggml-zendnn: fixed naming of mul_mat_id function ggml-zendnn: fixed print in mul_mat_id Co-authored-by: plotnikov.v10 plotnikov.v10@wb.ru macOS/iOS: macOS Apple Silicon (arm64)…

4
llama.cpp releases dev-tools 1mo ago

b9351

macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Ubuntu x64…

20
llama.cpp releases dev-tools 1mo ago

b9334

CUDA: missing PDL sync for FWHT, better fallback ( #23690 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

19
llama.cpp releases dev-tools 1mo ago

b9333

metal : add apple device id ( #23566 ) Co-authored-by: lvyichen lvyichen@stepfun.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

14
llama.cpp releases dev-tools 1mo ago

b9331

ci : reduce PR jobs by matching backend paths ( #23675 ) ci : disable SYCL f16 builds ci : extract android and hip into separate workflows ci : move webgpu to separate workflow ci : move the rpc to a separate workflow ci : extract s309x and ppcl jobs ci : extract opencl job into…

24
llama.cpp releases dev-tools 1mo ago

b9341: convert : support Gemma4ForCausalLM architecture (#23682)

convert : support Gemma4ForCausalLM architecture ( #23674 ) fix indent Co-authored-by: Oleg Afonin your.email@example.com Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com

15
llama.cpp releases dev-tools 1mo ago

b9330

model: tag ffn_latent as MUL_MAT to fix buft probe ( #23664 ) ffn_latent_down/up are declared GGML_OP_MUL in LLM_TENSOR_INFOS but nemotron-h feeds them through ggml_mul_mat. The loader buft probe asks the backend about the declared op, so it tested an elementwise MUL on a q8_0…

7
llama.cpp releases dev-tools 1mo ago

b9329

CUDA: add fast walsh-hadamard transform ( #23615 ) CUDA: add fast walsh-hadamard transform review: add unrolls + change size_t -> int warp size 64 Co-authored-by: Johannes Gäßler johannesg@5d6.de macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…

11
llama.cpp releases dev-tools 1mo ago

b9326

sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO)…

26
llama.cpp releases dev-tools 1mo ago

b9320

TP: fix ggml context size calculation ( #22616 ) TP: fix ggml context size calculation, memory leak move split state cache back into the context revert to constant ggml context size for cgraphs increase headroom for statically allocated tensors remove obsolete include macOS/iOS:…

33
llama.cpp releases dev-tools 1mo ago

b9319

ggml: gguf_init_from_callback and gguf_init_from_buffer ( #22341 ) ggml: implement gguf_init_from_buffer test: gguf_init_from_buffer fix: memory breakdown for a model loaded with no_alloc from a file is consistent with being loaded from a buffer fix: use GGML_UNUSED…

9
llama.cpp releases dev-tools 1mo ago

b9318

server: MTP layer kv-cache should respect draft type ctk ( #23646 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

9
llama.cpp releases dev-tools 1mo ago

b9315

llama : document that only one on-device state can be saved per sequence ( #23520 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

13
llama.cpp releases dev-tools 1mo ago

b9313

ggml : Parallelize quant LUT init ( #23595 ) Use OpenMP to parallelize iq2xs_init_impl and iq3xs_init_impl. Move the OpenMP detection from ggml-cpu to ggml-base. Update OpenMP dependencies in ggml-config.cmake.in. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…

28
llama.cpp releases dev-tools 1mo ago

b9311

vendor : update cpp-httplib to 0.45.1 ( #23639 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…

25
llama.cpp releases dev-tools 1mo ago

b9310

server: fix checkpoints creation ( #22929 ) common : add common_chat_split_by_role cont : fix spans to reach end of message server: fix checkpoints creation extract message_spans from chat templates find the prompt token position before the latest user message split prompt…

36
llama.cpp releases dev-tools 1mo ago

b9309: perplexity : fix even more integer overflows (#23623)

Co-authored-by: Stanisław Szymczyk sszymczy@gmail.com

30
llama.cpp releases dev-tools 1mo ago

b9305

cmake : fix ui build ( #23592 ) cmake/ui : add -fPIC to llama-ui static lib cmake : rename host compiled embed helper macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…

29
llama.cpp releases dev-tools 1mo ago

b9301

hexagon: apply repl optimization in flash attn softmax as #22993 ( #23 …

27
llama.cpp releases dev-tools 1mo ago

b9297

model : add NVFP4 MTP scale tensors ( #23563 ) Add NVFP4 MTP scale tensors Link Qwen3.5 MTP tensors Aligned nullptr macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…

11
llama.cpp releases dev-tools 1mo ago

b9296

ggml : Check the right iface method before using the fallback 2d get ( #23514 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

9
llama.cpp releases dev-tools 1mo ago

b9295

vulkan: fix windows find_package of SPIRV-Headers ( #23215 ) vulkan: fix windows find_package of SPIRV-Headers not windows-only macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…

17
llama.cpp releases dev-tools 1mo ago

b9294

opencl: generalize Adreno MoE kernels on M ( #23449 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

17
llama.cpp releases dev-tools 1mo ago

b9291

SYCL: improve MoE prefill throughput ( #23142 ) change k_copy_src1_to_contiguous so that uses a precomputed contiguous mapping where all rows "owned" by an expert are in one slice with a know starts and ends switch the O(n_as * n_routed_rows) contraption to a counting sort-based…

27
llama.cpp releases dev-tools 1mo ago

b9292

perplexity : fix integer overflow ( #23496 ) Co-authored-by: Stanisław Szymczyk sszymczy@gmail.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…

13
llama.cpp releases dev-tools 1mo ago

b9290

sycl : Level Zero detection in ggml_sycl_init ( #23097 ) [SYCL] Centralize Level Zero detection in ggml_sycl_init use the same wording get back the warning macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework…

15
llama.cpp releases dev-tools 1mo ago

b9289

SYCL : gated_delta_net K>1 ( #23174 ) sycl_gated_delta_net K>1 editor_config macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

34
llama.cpp releases dev-tools 1mo ago

b9286

ggml-zendnn : add Q8_0 quantization support ( #23414 ) ggml-zendnn : add Q8_0 quantization support ggml-zendnn : sync with latest ZenDNN ggml-zendnn : address review comments for Q8_0 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS…

15
llama.cpp releases dev-tools 1mo ago

b9285

cmake : build router app only during standalone builds ( #23521 ) Co-authored-by: Stanisław Szymczyk sszymczy@gmail.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64…

17
llama.cpp releases dev-tools 1mo ago

b9284

vocab : fix HybridDNA tokenizer ( #23466 ) vocab : mark hybriddna k-mers to avoid BPE token collisions improved loop Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel…

9
llama.cpp releases dev-tools 1mo ago

b9283

cmake : add install() for impl libraries + fix apple builds ( #23511 ) pi : update ci : fix ios build ci : fix andoroid ci : fix apple builds cmake : add install() for impl libraries Add install(TARGETS LIBRARY) for all -impl libraries that were changed from STATIC to shared…

14
llama.cpp releases dev-tools 1mo ago

b9279

vulkan: fuse snake activation (mul, sin, sqr, mul, add) ( #22855 ) vulkan: fuse snake activation (mul, sin, sqr, mul, add) Add snake.comp shader with F32 / F16 / BF16 pipelines and ggml_vk_snake_dispatch_fused. The matcher recognizes the naive 5 op decomposition emitted by audio…

23
llama.cpp releases dev-tools 1mo ago

b9277

tests : move save-load-state from examples to tests ( #23336 ) tests : move save-load-state from examples to tests Move examples/save-load-state/ to tests/test-save-load-state.cpp Remove subdirectory reference from examples/CMakeLists.txt Add test to tests/CMakeLists.txt as a…

25
llama.cpp releases dev-tools 1mo ago

b9276

server: expose prompt token counts in /slots endpoint ( #23454 ) Add n_prompt_tokens, n_prompt_tokens_processed, and n_prompt_tokens_cache to the /slots JSON response. These fields are already tracked internally but were not exposed, making it impossible for clients to monitor…

15
llama.cpp releases dev-tools 1mo ago

b9275

metal : optimize concat kernel and fix set kernel threads ( #23411 ) metal : fix GGML_OP_SET kernel threads tests : extend test_cpy to support different src/dst shapes Extend test_cpy to support different source and destination tensor shapes for CPY operations (reshaping), where…

37
llama.cpp releases dev-tools 1mo ago

b9274

server : free draft/MTP resources on sleep to fix VRAM leak ( #23461 ) The destroy() function in server_context_impl only cleaned up the main model and context (via llama_init.reset()) but did not free the speculative decoder (spec), draft context (ctx_dft), or draft model…

22
llama.cpp releases dev-tools 1mo ago

b9282

CUDA: fix PDL CC check for JIT compilation ( #23471 )

31
llama.cpp releases dev-tools 1mo ago

b9273

server: re-inject subcommand when router spawns children under unified binary ( #23442 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…

32
llama.cpp releases dev-tools 1mo ago

b9272

app : add batched-bench, fit-params, quantize & perplexity ( #23459 ) app : add batched-bench, fit-params, quantize & perplexity Signed-off-by: Adrien Gallouët angt@huggingface.co Add missing main.cpp Signed-off-by: Adrien Gallouët angt@huggingface.co Add EOL Signed-off-by:…

37
llama.cpp releases dev-tools 1mo ago

b9271

mtp: use inp_out_ids for skipping logit computation ( #23433 ) when doing a follow-up decode for the draft model, we were always doing the logit computation even though it is not required. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS…

23
llama.cpp releases dev-tools 1mo ago

b9270

vocab : add Carbon-3B (HybridDNATokenizer) support ( #23410 ) vocab : add Carbon-3B (HybridDNATokenizer) support Adds a new BPE pre-type LLAMA_VOCAB_PRE_TYPE_CARBON for the HybridDNATokenizer used by HuggingFaceBio/Carbon-{500M,3B,8B}. The base BPE is Qwen3-4B-Base's; what…

11
llama.cpp releases dev-tools 1mo ago

b9267

ggml : Check the right iface method before using the fallback 2d get ( #23306 ) Probably no backends implement only one of 2d get/set, but this might be annoying for some future backend developer trying to add 2d get/set. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple…

11
llama.cpp releases dev-tools 1mo ago

b9266

llama-graph: fix null-buffer crash in llm_graph_input_attn_kv_iswa for SWA-only models ( #23131 ) When a model has zero non-SWA attention layers (e.g. a SWA-only slice of Gemma 4), the base KV cache has no layer tensors. The input tensors (self_k_idxs, self_v_idxs, self_kq_mask)…

13
llama.cpp releases dev-tools 1mo ago

b9264

app : show version ( #23426 ) Signed-off-by: Adrien Gallouët angt@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

26
llama.cpp releases dev-tools 1mo ago

b9265: hexagon: ssm-conv fix for large prompts (#23307)

hexagon: remove gathers and better handling of vtcm in ssm-conv hexagon: relax ssm-conv gating requirements hexagon: add new prefill ssm-conv backend test hexagon: remove trailing white space hex-rope: uninline rope_cache_init, otherwise it breaks after rebaseing with SSM_CONV…

34
llama.cpp releases dev-tools 1mo ago

b9263

mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision precision ( #23329 ) HunyuanOCR shares the same HF arch and vision layout as HunyuanVL butwas split into a separate path that skipped the +0.1 bilinear sampler used by the HF reference. Collapse OCR into the…

12
llama.cpp releases dev-tools 1mo ago

b9260

opencl: refactor backend initilization ( #23318 ) opencl: refactor initialization opencl: refactor GPU identification opencl: rename for consistency opencl: cache global mem size in dev_ctx opencl: adjust log level opencl: load argsort and flash_attn kernels in supports_op…

7
llama.cpp releases dev-tools 1mo ago

b9259

common/speculative : fix nullptr crash in get_devices_str ( #23386 ) ggml_backend_dev_by_name always appends a nullptr sentinel to the devices vector. Skipping nullptr entries prevents assertion failure in ggml_backend_dev_name. Assisted-by: llama.cpp:local pi macOS/iOS: macOS…

20
llama.cpp releases dev-tools 1mo ago

b9258

mtmd : DeepSeek-OCR image processing fixes, img_tool::resize padding refactor ( #23345 ) mtmd : deepseek-ocr fixes, improvements and refactoring image processing changes to achieve full parity with Pillow (reference impl) SAM mask casting only when flash-attn is on SAM refactor…

24

Page 8 of 10 · 469 articles ← Newer Older →

Product

Pricing
Roadmap
Changelog
Incidents

Resources

News RSS
MCP RSS
MCP releases RSS
Incidents RSS
Changelog RSS

Project

About
API
Contact

Legal

Privacy
Terms
Security

Prismix · © 2026 · AI Hub

All product names and logos are trademarks of their respective owners.

Send feedback

Name (optional)

Email *

Message *