Home Status News MCP Pricing Sign in

Home Status News MCP Pricing Sign in

News / llama.cpp releases

llama.cpp releases

469 articles archived · Visit source ↗ · RSS

Sign in to subscribe

llama.cpp releases dev-tools 23d ago

b9577

server: log prompts to directory ( #22031 ) server: log prompts to directory Add --log-prompts-dir to write each prompt to a separate text file in the specified directory. Apply suggestion from @ngxson Co-authored-by: Xuan-Son Nguyen thichthat@gmail.com macOS/iOS: macOS Apple…

35
llama.cpp releases dev-tools 23d ago

b9575

ggml : add GGML_OP_COL2IM_1D ( #24206 ) cpu: add GGML_OP_COL2IM_1D Add the overlap-add (scatter-add) step of a 1D transposed convolution. A ConvTranspose1d factorizes as a GEMM followed by col2im: a weight pre-permuted to [IC, K OC] is contracted against the [IC, T_in] input…

4
llama.cpp releases dev-tools 23d ago

b9574

server : do not clear slots without unified KV cache ( #24190 ) Always export idle slots to RAM Without this, a slot's VRAM cache may not be written to RAM. If this slot happens to be busy then later on, this triggers needless preprocessing in another slot. cont : clean-up…

33
llama.cpp releases dev-tools 23d ago

b9573

models : fix plamo2 attention_key/value_length regression ( #24317 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

15
llama.cpp releases dev-tools 23d ago

b9572

ggml-cpu : fix rms_norm_back wrong output under in-place aliasing ( #24305 ) ggml-cpu : fix rms_norm_back wrong output under in-place aliasing cont : clean-up comment Co-authored-by: Georgi Gerganov ggerganov@gmail.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…

27
llama.cpp releases dev-tools 23d ago

b9571

Remove case for GGML_TYPE_Q4_K in mvvq.cu ( #23528 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

7
llama.cpp releases dev-tools 23d ago

b9570

ggml-webgpu: Add clang-format job ( #24308 ) Add clang-format job try local formatting macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…

34
llama.cpp releases dev-tools 23d ago

b9568

mtp: support for gemma-4 E2B and E4B assistants ( #24282 ) models: update converter to support smaller assistants models: add masked_embd tensors to gemma4-assist arch gemma-4: remove temp debug for conversion gemma-4-mtp: filter out masked_embedding tensors during conversion…

23
llama.cpp releases dev-tools 23d ago

b9567

server : do not parse when flushing http headers ( #24281 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

26
llama.cpp releases dev-tools 23d ago

b9566

graph: guard iswa kq_mask on its own buffer ( #24294 ) A SWA-only draft head (e.g. StepFun MTP) leaves the base sub-cache empty, so its kq_mask buffer stays null and asserts at load. Guard each mask on its own buffer in set_input and can_reuse, base and swa. Co-authored-by:…

23
llama.cpp releases dev-tools 23d ago

b9565

[ggml-webgpu] Handle buffer overlap / buffer aliasing for concat operator ( #24000 ) Only run webgpu CI on my fork Add webgpu only workflow handle buffer overlap case for concat operator restore build-webgpu.yml Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com Run…

14
llama.cpp releases dev-tools 23d ago

b9564

[ggml-webgpu] Implement 2D workgroups for scale, binary, and unary ops ( #24044 ) Only run webgpu CI on my fork Add webgpu only workflow Implement 2d workgroups for more operations fix Fix type Move back to global_invocation_id macOS/iOS: macOS Apple Silicon (arm64) macOS Apple…

24
llama.cpp releases dev-tools 24d ago

b9562

mtmd : add video input support ( #24269 ) wip ok: lazy bitmap API remember to free lazy text wip add mtmd_helper_video support video input on server (base64 input) add MTMD_VIDEO config add timestamp update CLI cli: allow auto-completion for video add --video arg fix build…

22
llama.cpp releases dev-tools 24d ago

b9561

sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64…

13
llama.cpp releases dev-tools 24d ago

b9559

cli: fix spinner not show during prompt processing ( #24283 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

10
llama.cpp releases dev-tools 24d ago

b9563

docker: install ffmpeg in the released image ( #24302 )

24
llama.cpp releases dev-tools 24d ago

b9558

vulkan: Use cm2 decode_vector for mul_mat_id B matrix loads ( #23991 ) This allows vec4 loads of the B elements. Also increase BK to 64 when this is enabled. Neither of these alone is consistently faster, but together these give a nice speedup. In ggml-vulkan.cpp, we need to…

28
llama.cpp releases dev-tools 24d ago

b9557

cuda: reset cuda context after reading memory size ( #23935 ) cuda: reset device in get_memory function if no backend is active also count device and host buffers exclude hip and musa from counting and device reset use device mutex instead of atomic undo backend_free function…

34
llama.cpp releases dev-tools 24d ago

b9556

HIP: add gfx1152 and gfx1153 to RDNA3.5 ( #24129 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

10
llama.cpp releases dev-tools 24d ago

b9555

metal : fix im2col 1D case (audio models) ( #24220 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

29
llama.cpp releases dev-tools 24d ago

b9554: [SYCL] Update compute runtime version to 26.x in docker (#24070)

update compute runtime from 25 to 26 in docker add comment with old driver for multiple GPUs

12
llama.cpp releases dev-tools 24d ago

b9553

common : relax sampler name matching ( #23744 ) common : relax sampler name matching Currently, in some cases, the alternative names for samplers (like top-k and min-p instead of the canonical top_k and min_p ) are not always recognized by the common_sampler_types_from_names…

32
llama.cpp releases dev-tools 24d ago

b9551

kv-cache : avoid kv cells copies ( #24277 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

7
llama.cpp releases dev-tools 25d ago

b9550

kv-cache: follow the source cache size when sharing cells ( #24267 ) A fitted target context can end up smaller than the draft default, the oversized assistant views then overflow the shared K/V tensors and trip the ggml_view_4d size assert during graph reserve. macOS/iOS: macOS…

25
llama.cpp releases dev-tools 25d ago

b9549

llama : add Gemma4 MTP ( #23398 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…

18
llama.cpp releases dev-tools 25d ago

b9548

spec : fix vocab compatibility check ( #24256 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

14
llama.cpp releases dev-tools 25d ago

b9547

arg: Skip mmproj download when user supplied mmproj ( #24239 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

32
llama.cpp releases dev-tools 25d ago

b9544

common/chat : fix LFM2/LFM2.5 reasoning round-trip and leak ( #24234 ) common/chat : fix LFM2 reasoning round-trip and stray leak Gate by reasoning format and whether the template supports macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…

30
llama.cpp releases dev-tools 25d ago

b9543

mtmd: support "frame merge" for qwen-vl-based models ( #21858 ) feat: add video support for Qwen3.5 various clean up revise the design fix llava-uhd case nits nits 2 Co-authored-by: andrewmd5 1297077+andrewmd5@users.noreply.github.com macOS/iOS: macOS Apple Silicon (arm64) macOS…

37
llama.cpp releases dev-tools 26d ago

b9542

completion : remove useless statics ( #24226 ) Signed-off-by: Adrien Gallouët angt@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu…

6
llama.cpp releases dev-tools 26d ago

b9541

completion : fix format specifier in LOG_INF ( #24213 ) Signed-off-by: Adrien Gallouët angt@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…

7
llama.cpp releases dev-tools 26d ago

b9538

model : rename local n_layer_all variable ( #24209 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

15
llama.cpp releases dev-tools 26d ago

b9537

context : fix off-by-one comparisons to n_gpu_layers ( #24208 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

37
llama.cpp releases dev-tools 26d ago

b9536

opencl: improve get_rows, cpy, concat and q6_k flat gemv ( #24160 ) opencl: allow multiple workgroups for large rows opencl: improve small cpy opencl: packed concat for small input opencl: tweak flat q6_K gemv, increase N_DST and remap threads macOS/iOS: macOS Apple Silicon…

27
llama.cpp releases dev-tools 26d ago

b9535

common/chat : unify and fix LFM2/LFM2.5 tool parser ( #24178 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

19
llama.cpp releases dev-tools 26d ago

b9534

vulkan: add fwht support for Intel with shmem reduction ( #23964 ) vulkan: add fwht support for Intel with shmem reduction don't use N as workgroup size disable subgroup shuffle on MoltenVK AMD disable fwht shader on Intel Windows due to driver bug macOS/iOS: macOS Apple Silicon…

21
llama.cpp releases dev-tools 26d ago

b9533

model: fix build failed ( #24193 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…

28
llama.cpp releases dev-tools 27d ago

b9531

TP: round up granularity to 128 ( #24180 ) TP: round up granularity to 128 remove assert macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…

25
llama.cpp releases dev-tools 27d ago

b9530

cli: fix model params not propagated ( #23893 ) Fixes #23847 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

21
llama.cpp releases dev-tools 27d ago

b9529

model : fix llama_model::n_gpu_layers() ( #24188 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

36
llama.cpp releases dev-tools 27d ago

b9528

ui: run npm install when package-lock.json is newer than node_modules ( #24171 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…

17
llama.cpp releases dev-tools 27d ago

b9524

minor : fix lint issues ( #24165 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…

17
llama.cpp releases dev-tools 27d ago

b9523

hparams : refactor hparams.n_layer ( #24060 ) hparams : refactor hparams.n_layer cont : remove n_layer_kv() , use n_layer_all instead cont : type consistency pi : update SYSTEM.md models : fix Step3.5 MTP cont : remove duplicate switch cases cont : explicitly set false to extra…

30
llama.cpp releases dev-tools 27d ago

b9522

kleidiai : dynamic chunck-based scheduling for hybrid execution ( #23819 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

16
llama.cpp releases dev-tools 27d ago

b9521

CUDA: enroll mul_mat_vec_q_moe into pdl ( #24087 ) Enroll mul_mat_vec_q_moe into PDL, boosting MTP performance on BW Data collected on a B4500: Before (llama.cpp) ➜ llama.cpp git:(master) ✗ python mtp-bench.py code_python pred= 192 draft= 150 acc= 116 rate=0.773 tok/s=202.8…

10
llama.cpp releases dev-tools 27d ago

b9519

sycl : port multi-column MMVQ from CUDA backend ( #21845 ) mmvq: Port the ncols_dst optimization from ggml-cuda/mmvq.cu to SYCL. Read weights once per dispatch instead of once per column. Covers all standard quant types + reorder paths for Q4_0, Q8_0, Q3_K, Q4_K, Q5_K, Q6_K. IQ…

4
llama.cpp releases dev-tools 27d ago

b9518

server : disable on-device spec checkpoints ( #24108 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

15
llama.cpp releases dev-tools 28d ago

b9515

Move duplicated imatrix code into single common imatrix-loader.cpp ( #22445 ) Deduplicate imatrix loading code Add back LLAMA_TRACE, early exit on quantize missing metadata macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel…

26
llama.cpp releases dev-tools 28d ago

b9512

return filter to save memory ( #24125 ) Co-authored-by: lvyichen lvyichen@stepfun.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…

25
llama.cpp releases dev-tools 28d ago

b9510

ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 ( #22209 ) ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 Optimize the inner loop of ggml_vec_dot_q4_1_q8_1_generic using WASM SIMD128 intrinsics, gated behind #ifdef wasm_simd128 so non-wasm builds are…

11

Page 5 of 10 · 469 articles ← Newer Older →

Product

Pricing
Roadmap
Changelog
Incidents

Resources

News RSS
MCP RSS
MCP releases RSS
Incidents RSS
Changelog RSS

Project

About
API
Contact

Legal

Privacy
Terms
Security

Prismix · © 2026 · AI Hub

All product names and logos are trademarks of their respective owners.

Send feedback

Name (optional)

Email *

Message *