Home Status News MCP Pricing Sign in

Home Status News MCP Pricing Sign in

News / llama.cpp releases

llama.cpp releases

469 articles archived · Visit source ↗ · RSS

Sign in to subscribe

llama.cpp releases dev-tools 28d ago

b9509

server: avoid unnecessary checkpoint restore when new tokens are present ( #24110 ) server: avoid unnecessary checkpoint restore when new tokens are present The pos_min_thold calculation unconditionally subtracts 1 to ensure at least one token is evaluated for logits when no new…

21
llama.cpp releases dev-tools 28d ago

b9505

server : add header to tools/server/server-http.h ( #24089 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

29
llama.cpp releases dev-tools 28d ago

b9504

cmake: skip cvector-generator and export-lora when CPU backend is disabled ( #24053 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…

4
llama.cpp releases dev-tools 28d ago

b9503

fix(mtmd): handle Gemma 4 audio projector embedding size ( #24091 ) mtmd: handle Gemma 4 audio projector embedding size rm projection_dim from clip_n_mmproj_embd Co-authored-by: Xuan Son Nguyen son@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…

28
llama.cpp releases dev-tools 28d ago

b9500

metal : reduce rset heartbeat from 500ms -> 5ms ( #24074 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

37
llama.cpp releases dev-tools 28d ago

b9499

ggml-webgpu: FlashAttention refactor + standardize quantization support ( #23834 ) Start work on flash_attn refactor Refactor Split k/v quantization Refactor and abstract quantization logic for flash_attn and mul_mat Add quantization support to tile path formatting Move to…

23
llama.cpp releases dev-tools 28d ago

b9498

ggml-cpu: extend RVV quantization vec dot to higher VLENs ( #22754 ) ggml-cpu: add rvv 512b,1024b impls for iq4_xs ggml-cpu: refactor; add rvv 512b, 1024b impls for q6_K, i-quants ggml-cpu: refactor; add 512 and 1024 implementations of tq3_s, iq3_xxs, iq2_s, iq2_xs, iq2_xxs…

22
llama.cpp releases dev-tools 28d ago

b9501: tests : refactor test-save-load-state to accept token input (#24073)

tests : refactor test-save-load-state to accept token input Default prompt is now empty; when not provided, generate n_batch random tokens (useful for models without a tokenizer) Tokenization happens once upfront; pass token vector to test functions generate_tokens prints token…

26
llama.cpp releases dev-tools 28d ago

b9496

mtmd: fix Gemma 4 unified FPE ( #24088 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…

27
llama.cpp releases dev-tools 29d ago

b9495

qwen35: use post-norm hidden state for MTP ( #24025 ) qwen35: use post-norm hidden state for MTP rename pre_norm to nextn fix step35 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64…

37
llama.cpp releases dev-tools 29d ago

b9494

mtmd: enable non-causal vision for gemma 4 unified ( #24082 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

22
llama.cpp releases dev-tools 29d ago

b9493

mtmd, model: allow skip build_vit() ( #24077 ) add model nits macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

26
llama.cpp releases dev-tools 29d ago

b9491

Avoid PDL race conditions by disabling restrict when PDL is used ( #24030 ) Removes restrict from PDL kernel headers due to incompatibility with PDL. Adds preprocessor directives based on arch in kernel body to add restrict to retain performance on older architectures.…

9
llama.cpp releases dev-tools 29d ago

b9490

ggml-cpu: use runtime SVE width in FWHT ( #24059 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

32
llama.cpp releases dev-tools 29d ago

b9489

cuda: reserve space for quantize kv-cache at startup ( #23907 ) cuda: reserve space for quantize kv-cache at startup address review comments remove forward decl Co-authored-by: Johannes Gäßler johannesg@5d6.de remove assert in ggml-cuda.cu Co-authored-by: Johannes Gäßler…

25
llama.cpp releases dev-tools 29d ago

b9488

tests : add support for qwen3 SSM archs ( #24031 ) tests : add support for qwen3 SSM archs arch : add LLM_KV_ATTENTION_RECURRENT_LAYERS cont : naming + TODOs macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS…

24
llama.cpp releases dev-tools 29d ago

b9486

ci : disable ccache for msvc windows release jobs ( #23911 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

6
llama.cpp releases dev-tools 29d ago

b9487

update BoringSSL to 0.20260526.0 ( #23794 )

28
llama.cpp releases dev-tools 29d ago

b9485

arg : removed unecesary mmproj download when users pass --no-mmproj ( #23425 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

38
llama.cpp releases dev-tools 29d ago

b9484

opencl: use flat variants of q4_K and q6_K gemv for very large M ( #24006 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

6
llama.cpp releases dev-tools 29d ago

b9483

hexagon: profiler output fix and script updates ( #24042 ) hex-ops: fix profiler output (ie remove the redundant NONEs) hex-prof: update profiling script to support tot.usec column macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED…

26
llama.cpp releases dev-tools 29d ago

b9482

model: add Mellum architecture ( #23966 ) model: support for Mellum architecture model: improve mellum.py formatting model: improve mellum.py formatting once again deps: downgrade transformers to 4.57.6 (to fix CI) deps: remove huggingface_hub dependency deps: remove…

13
llama.cpp releases dev-tools 1mo ago

b9481

model : support granite multilingual embeddings R2 (ibm-granite/granite-embedding-{97,311}m-multilingual-r2) ( #22716 ) Add support for the ibm-granite/granite-embedding-{97m,311m}-multilingual-r2 embedding models: Added a version of the gpt4o tokenizer that has a fixed regex…

22
llama.cpp releases dev-tools 1mo ago

b9480

StepFun 3.5 MTP ( #23274 ) StepFun 3.5 MTP Simplify to single layer Rollback core changes fix flake8 errors Remove scripts modify to convention Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com dos2unix Co-authored-by: Sigbjørn…

14
llama.cpp releases dev-tools 1mo ago

b9479

common : fix state save in common_prompt_batch_decode ( #23468 ) common : fix state save in common_prompt_batch_decode This commit addresses a bug in common_prompt_batch_decode that affects the session state store/restore in completion.cpp and save-load-state.cpp. The motivation…

10
llama.cpp releases dev-tools 1mo ago

b9478

server: add SSE ping interval ( #24013 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…

17
llama.cpp releases dev-tools 1mo ago

b9474

ui: Add Thinking mode toggle with reasoning effort levels + improvements for Chat Form Add Action UI ( #23434 ) feat: Add "Thinking" toggle and status icon + redesign Chat Form Actions Add panel test: Update test reference fix: Icon fix: E2E test command fix: wait for greeting…

5
llama.cpp releases dev-tools 1mo ago

b9473

kv-cache : SWA checkpoints store only non-masked cells ( #23981 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

37
llama.cpp releases dev-tools 1mo ago

b9471

llama : deprecate llama_set_warmup ( #24009 ) llama : deprecate llama_set_warmup cont : fix type Co-authored-by: Daniel Bevenius daniel.bevenius@gmail.com Co-authored-by: Daniel Bevenius daniel.bevenius@gmail.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…

19
llama.cpp releases dev-tools 1mo ago

b9470

hexagon: MUL_MAT, MUL_MAT_ID, FLASH_ATTN and GDN cleanup and optimizations for latest models ( #23989 ) hex-mm: initial support for F32 * F32 -> F32 matmuls hex-rms-norm: fix src1 stride use in fused rms_norm_mul hex-ops: clear spad pointers in the ops that clober it This fixes…

10
llama.cpp releases dev-tools 1mo ago

b9469

hexagon: add gelu_quick ( #24007 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…

37
llama.cpp releases dev-tools 1mo ago

b9468

server: real-time reasoning interruption via control endpoint ( #23971 ) server: real-time reasoning interruption via control endpoint Builds on the manual reasoning budget trigger from #23949 . Adds a CONTROL task that mirrors the CANCEL path on the live slot and calls…

17
llama.cpp releases dev-tools 1mo ago

b9467

clean up unused variables warnings ( #23975 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

14
llama.cpp releases dev-tools 1mo ago

b9466

opencl: fix compiler warnings for non-adreno path ( #23922 ) opencl: fix compiler warnings for non-adreno path opencl: fix const cast warning macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux:…

31
llama.cpp releases dev-tools 1mo ago

b9464

speculative : fix n_outputs_max and remove draft-simple auto-enable ( #23988 ) speculative : add common_speculative_n_max helper function Extract the speculative max-draft-size logic from server_n_outputs_max into a reusable common_speculative_n_max() function in…

7
llama.cpp releases dev-tools 1mo ago

b9460

llama: limit max outputs of llama_context ( #23861 ) llama: save more VRAM by reserving n_outputs == n_seqs when possible add n_outputs_per_seq move n_outputs_max to server-context change ubatch to batch everywhere macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…

15
llama.cpp releases dev-tools 1mo ago

b9459

metal: template GLU kernels to support f16/f32 ( #23882 ) Drops the hardcoded f32 GLU kernels in favor of a single template. We now load/store in the native tensor type (half or float) to save memory bandwidth, but keep the actual ALU compute in float to avoid exploding math in…

35
llama.cpp releases dev-tools 1mo ago

b9458

vulkan: don't hold the device mutex while compiling pipelines ( #23641 ) vulkan: don't hold the device mutex while compiling pipelines We need to hold a lock while we traverse all pipelines and lazily initialize them, but we don't need to hold it while the pipeline is being…

37
llama.cpp releases dev-tools 1mo ago

b9457

vulkan: reduce host memory lock contention ( #23376 ) vulkan: reduces lock contention replace unique_lock with lock_guard macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…

26
llama.cpp releases dev-tools 1mo ago

b9455

TP: quantized KV cache support ( #23792 ) TP: quantized KV cache support fix partial view remove overly strict assert macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…

15
llama.cpp releases dev-tools 1mo ago

b9453

model: Add EXAONE 4.5 implementations ( #21733 ) Add EXAONE 4.5 and Add GQA for MMproj mtmd: EXAONE 4.5 vision markers and projector path EXAONE 4.5 uses and for image boundaries; Qwen keeps <|vision_start|> and <|vision_end|>. Route EXAONE 4.5 through the Qwen2.5-VL-style…

32
llama.cpp releases dev-tools 1mo ago

b9452

vulkan: Block-load Q3_K/Q6_K block data and subtract on 32b ints ( #23056 ) Q2_K/Q3_K/Q6_K do much better when using MMVQ on Intel BMG even though they're only 2-byte aligned, and Q3_K still wins on NVIDIA as well. mesa isn't all that great at coalescing back-to-back loads from…

4
llama.cpp releases dev-tools 1mo ago

b9451

vulkan: Removed unused functions ( #23175 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

37
llama.cpp releases dev-tools 1mo ago

b9445: ci: remove redundant or duplicate jobs (#23927)

remove redundant apple job openvino gpu and cpu test can share the same build and machine Update build-rpc.yml Update build-openvino.yml cpu any doesnt make sense as we have an arm job already, so do high perf on both x86 and arm remove duplicate x86 vulkan combine backend…

31
llama.cpp releases dev-tools 1mo ago

b9444

server : handle If-None-Match weak ETags ( #23916 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

12
llama.cpp releases dev-tools 1mo ago

b9442

vocab : add tokenizer support for jina-embeddings-v2-base-zh ( #18756 ) vocab : add jina-embeddings-v2-base-zh (whitespace tokenizer) lowercase defaults to true type fix Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com macOS/iOS: macOS Apple Silicon (arm64) macOS…

12
llama.cpp releases dev-tools 1mo ago

b9441

ui: fix ETag truncation with MSVC compiler ( #23917 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

20
llama.cpp releases dev-tools 1mo ago

b9439

llama: only use one iGPU device by default ( #23897 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

4
llama.cpp releases dev-tools 1mo ago

b9438: webui: add custom CSS injection via config (#23904)

webui: add custom CSS injection via config register a customCSS setting in the Developer section under Custom JSON, syncable so it rides the existing ui-config pass through. inject the value into a single style element in the head, reactive on the setting. lets an operator theme…

28
llama.cpp releases dev-tools 1mo ago

b9437

Support -fa auto in llama-bench ( #23714 ) Support -fa auto in llama-bench Make the default value of -ngl -1, similar to other tools. Update README with latest usage and examples Address review comments macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…

16

Page 6 of 10 · 469 articles ← Newer Older →

Product

Pricing
Roadmap
Changelog
Incidents

Resources

News RSS
MCP RSS
MCP releases RSS
Incidents RSS
Changelog RSS

Project

About
API
Contact

Legal

Privacy
Terms
Security

Prismix · © 2026 · AI Hub

All product names and logos are trademarks of their respective owners.

Send feedback

Name (optional)

Email *

Message *