Home Status News MCP Pricing Sign in

Home Status News MCP Pricing Sign in

News / llama.cpp releases

llama.cpp releases

468 articles archived · Visit source ↗ · RSS

Sign in to subscribe

llama.cpp releases dev-tools 13h ago

b9859

opencl: allow loading precompiled binary kernels from library ( #23042 ) opencl: allow loading binary kernel opencl: add libdl.h ggml-backend-dl is in ggml, which depends backend libs, thus ggml-opencl cannot depend on ggml-backend-dl add libdl.h to break cyclic dep opencl:…

6
llama.cpp releases dev-tools 14h ago

b9858

common : use hf primary split as model path ( #25194 ) Fixes #25181 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

19
llama.cpp releases dev-tools 17h ago

b9857

hexagon: flash attention rework (optimizations, accuracy improvements, etc) ( #25085 ) hex-mm: fold mm quant tasks into the main matmul threads hex-mm: minor formatting fixes hex-mm: cleanup is_quant checks in dma dispatch hex-mm: fix dst-spad alignment hex-mm: move fp kernels…

5
llama.cpp releases dev-tools 20h ago

b9856

CUDA: consistent use of restrict + PDL for FA ( #25185 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

32
llama.cpp releases dev-tools 21h ago

b9855

ggml-cpu: add AVX2 optimization for nvfp4 dot product and use UE4M3 LUT ( #23961 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…

6
llama.cpp releases dev-tools 1d ago

b9853

ui: Remove PWA navigate fallback to prevent caching API endpoint requ…

7
llama.cpp releases dev-tools 1d ago

b9852

opencl: initial q1_0 support ( #25160 ) opencl: general q1_0 support opencl: add Adreno GEMM/GEMV for q1_0 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…

4
llama.cpp releases dev-tools 1d ago

b9851

cuda : prevent integer truncation and overflow errors when using KQ mask strides in flash_attn_mask_to_KV_max kernel ( #24945 ) Co-authored-by: Stanisław Szymczyk sszymczy@gmail.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED…

10
llama.cpp releases dev-tools 1d ago

b9850

model : register t_layer_inp for qwen3next ( #25141 ) Fix input assignment in layer processing loop Fix DFLASH for qwen-coder-next add line break Added tensor for attention normalization in Qwen3 model. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…

37
llama.cpp releases dev-tools 1d ago

b9849

common,server: handle bracketed IPv6 literals in URL authority ( #25140 ) common,server: handle bracketed IPv6 literals in URL authority Parse the [host]:port form (RFC 3986) and bracket IPv6 hosts when formatting a URL authority: listening log, proxy Host header, proxy log,…

5
llama.cpp releases dev-tools 1d ago

b9848

CUDA: fix get_rows_back for tables with more than 65535 rows (grid-y clamp + stride) ( #25103 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x…

9
llama.cpp releases dev-tools 1d ago

b9847

CUDA: fix Gemma E4B MTP FlashAttention ( #25148 ) CUDA: fix Gemma E4B MTP FlashAttention remove unused template declaration macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…

16
llama.cpp releases dev-tools 1d ago

b9846

vulkan: roll bk loop in matmul for asahi linux ( #24663 ) vulkan: roll bk loop in matmul for asahi linux vulkan: fix inline comment vulkan: revert BK-loop unroll change vulkan: edit spirv directly for asahi roll bk loop vulkan: remove trailing whitespace at the end of comments…

6
llama.cpp releases dev-tools 1d ago

b9844

ggml-webgpu: add support for NVFP4 ( #25143 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

19
llama.cpp releases dev-tools 2d ago

b9843

Revert "sched : reintroduce less synchronizations during split compute ( #20793 )" ( #25138 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x…

33
llama.cpp releases dev-tools 2d ago

b9842

common : dedup preset and cached model entries in /v1/models ( #25131 ) Signed-off-by: Adrien Gallouët angt@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…

29
llama.cpp releases dev-tools 2d ago

b9840

DeepSeek V4 ( #24162 ) convert: add dsv4 conversion add basic setup add llm_graph_input_dsv4 add save-load state add sinkhorn eps - correction by @fairydreaming add rope fix cleanup dead code fix bugs support pro model: added by @fairydreaming remove redundant V cache Chat…

26
llama.cpp releases dev-tools 2d ago

b9839

tools/ui: restore Tailwind scanning in ignored worktrees ( #24879 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

24
llama.cpp releases dev-tools 3d ago

b9838

common : remove unused regex-partial ( #25118 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

24
llama.cpp releases dev-tools 3d ago

b9837

jinja, chat: add --reasoning-preserve flag ( #25105 ) jinja, chat: add --reasoning-preserve flag correct help message macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…

28
llama.cpp releases dev-tools 3d ago

b9835

ui: fix stop and reasoning skip in single-model mode ( #25084 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

15
llama.cpp releases dev-tools 3d ago

b9833

chat : implement minicpm5 parser ( #24889 ) Add minicpm5 tool call parser Refactor MiniCPM5 PEG parser per review feedback Fix jinja min/max API to match Jinja2 modify by review MiniCPM5: use autoparser for XML tool calls and fix grammar preserved-token triggers MiniCPM5: fix…

26
llama.cpp releases dev-tools 3d ago

b9832

jinja: add --dump-prog for debugging ( #25086 ) jinja: add --dump-prog for debugging Update common/jinja/runtime.cpp Co-authored-by: Sigbjørn Skjæret 1629204+CISC@users.noreply.github.com Co-authored-by: Sigbjørn Skjæret 1629204+CISC@users.noreply.github.com macOS/iOS: macOS…

21
llama.cpp releases dev-tools 3d ago

b9831

spec : add DFlash support ( #22105 ) spec: add DFlash v2 support dflash: support sliding window attention per layer_types docs: add dflash section Co-authored-by: Kashif Rasul kashif.rasul@gmail.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…

12
llama.cpp releases dev-tools 3d ago

b9830

common : allow --offline in llama download ( #25091 ) Expose the existing --offline flag to llama download so a script can run it to check whether a model is already cached and ready to be served without touching the network. Also fix a latent use-after-free in the URL-task…

4
llama.cpp releases dev-tools 4d ago

b9829

logs : reduce v2 ( #25078 ) server : reduce logs cont : common cont : spec cont : CMN_ -> COM_ macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x…

11
llama.cpp releases dev-tools 4d ago

b9828

opencl: flash attention improvement ( #25069 ) opencl: rework FA kernel for f16 and f32 opencl: flash-attention prefill prepass kernels flash_attn_kv_pad_f16 pads the tail KV tile to a BLOCK_N multiple flash_attn_mask_pad_f16 pads the matching mask tile flash_attn_blk_f16…

13
llama.cpp releases dev-tools 4d ago

b9827

[CUDA] Added a cudaMemcpy2DAsync fast path to ggml_cuda_cpy ( #25057 ) [CUDA] Added a cudaMemcpy2DAsync fast path to ggml_cuda_cpy Add a CUDA ggml_cpy fast path for same-type, same-shape strided copies that are just 2D pitched block copies. When tensors are not fully contiguous…

14
llama.cpp releases dev-tools 4d ago

b9826

sycl : fix failed ut cases of norm ( #25044 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

13
llama.cpp releases dev-tools 4d ago

b9825

vulkan: fix step operator for 0 input ( #25036 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

9
llama.cpp releases dev-tools 4d ago

b9824

binaries : Improve rpc-server and export-graph-ops names. ( #25045 ) Tests are generally prefixed with -test, so rename export-graph-ops accordingly. rpc-server is probably too generic a name for /usr/bin. Because it should work with any ggml application, it is renamed to…

20
llama.cpp releases dev-tools 4d ago

b9823

ci : add windows-openvino to check-release ( #25022 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

11
llama.cpp releases dev-tools 4d ago

b9822

tests : fix test-chat-template --no-common option ( #25075 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

24
llama.cpp releases dev-tools 5d ago

b9821

app : allow --version, --licenses & --help ( #25054 ) Signed-off-by: Adrien Gallouët angt@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…

22
llama.cpp releases dev-tools 5d ago

b9820

sched : reintroduce less synchronizations during split compute ( #20793 ) CUDA: Improve performance via less synchronizations between token ( #17795 ) Adds CPU-to-CUDA copy capability to ggml_backend_cuda_cpy_tensor_async() Adds function to relax sync requirements between input…

26
llama.cpp releases dev-tools 5d ago

b9817

openvino: Update to OV 2026.2.1, self-contained release packages, operator improvements ( #24974 ) Update to OV 2026.2.1, Make OV release packages self-contained Update to OV 2026.2.1, Make OV release packages self-contained OpenVINO Backend: Remove compute_op_type hardcoded…

23
llama.cpp releases dev-tools 5d ago

b9816

sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64…

35
llama.cpp releases dev-tools 5d ago

b9814

vulkan: opt mul_mat_vecq for mi50 ( #22933 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

29
llama.cpp releases dev-tools 5d ago

b9813

vulkan: add INTEL_XE1 arch enum and enable coopmat1 on Intel Xe-LPG Plus ( #24404 ) vulkan: add INTEL_PRE_XE2 arch enum and enable coopmat1 on Intel Xe-LPG Plus (1/3, Xe1-ARLH) Co-authored-by: Xia, Jie jie.xia@intel.com Co-authored-by: Liu, Russell russell.liu@intel.com Address…

23
llama.cpp releases dev-tools 5d ago

b9811

vulkan: Workaround compiler bug in conv2d coopmat2 path ( #24924 ) vulkan: Workaround compiler bug in conv2d coopmat2 path apply same workaround to CONV_3D Apply suggestion from @jeffbolznv macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…

38
llama.cpp releases dev-tools 5d ago

b9810

CUDA: add cublasSgemmBatched mapping for HIP/MUSA vendor headers ( #25033 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

31
llama.cpp releases dev-tools 6d ago

b9804

mamba2: remove hardcoded 2x expansion factor and invalid d_inner % d_state check ( #23082 ) mamba2: remove hardcoded 2x expansion factor, support any expand value mamba2: remove invalid d_inner %% d_state check (unrelated parameters) Update convert_hf_to_gguf.py: make expand…

21
llama.cpp releases dev-tools 6d ago

b9803

opencl: flush profiling batch at shutdown for incomplete batches ( #25016 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

7
llama.cpp releases dev-tools 6d ago

b9802

macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO)…

15
llama.cpp releases dev-tools 6d ago

b9789

quant : fix quantizing moe with mtp ( #24986 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

5
llama.cpp releases dev-tools 6d ago

b9788

sycl : support --split-mode tensor ( #24152 ) Sycl tp stage1 ( #1 ) SYCL: tensor parallelism (--split-mode tensor) for dual-GPU Adds the comm_init/comm_free/comm_allreduce_tensor trio that the meta-backend queries via get_proc_address to enable backend-specific all-reduce,…

33
llama.cpp releases dev-tools 7d ago

b9787

sycl : fix the failed UT cases of conv_3d ( #24900 )

20
llama.cpp releases dev-tools 7d ago

b9786

opencl: support non-contig rows in norm ( #24965 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

8
llama.cpp releases dev-tools 7d ago

b9785

chat: harden caps check ( #24973 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…

23
llama.cpp releases dev-tools 7d ago

b9784

hexagon: MUL_MAT and MUL_MAT_ID rework : 32x32 tiled weight repack, kernel-params, cached graphs ( #24954 ) hex-mm: new weight layout and fusion updates hvx-mm: unroll the new tiled vec_dots to optimize hvx register util hex-mm: optimize dyn.quant format for q8_0 and q8_1 to…

36

Page 1 of 10 · 468 articles Older →

Product

Pricing
Roadmap
Changelog
Incidents

Resources

News RSS
MCP RSS
MCP releases RSS
Incidents RSS
Changelog RSS

Project

About
API
Contact

Legal

Privacy
Terms
Security

Prismix · © 2026 · AI Hub

All product names and logos are trademarks of their respective owners.

Send feedback

Name (optional)

Email *

Message *