llama.cpp releases
468 articles archived · Visit source ↗ · RSS
-
llama.cpp releases dev-tools 13h ago
b9859
opencl: allow loading precompiled binary kernels from library ( #23042 ) opencl: allow loading binary kernel opencl: add libdl.h ggml-backend-dl is in ggml, which depends backend libs, thus ggml-opencl cannot depend on ggml-backend-dl add libdl.h to break cyclic dep opencl:…
6 -
llama.cpp releases dev-tools 14h ago
b9858
common : use hf primary split as model path ( #25194 ) Fixes #25181 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
19 -
llama.cpp releases dev-tools 17h ago
b9857
hexagon: flash attention rework (optimizations, accuracy improvements, etc) ( #25085 ) hex-mm: fold mm quant tasks into the main matmul threads hex-mm: minor formatting fixes hex-mm: cleanup is_quant checks in dma dispatch hex-mm: fix dst-spad alignment hex-mm: move fp kernels…
5 -
llama.cpp releases dev-tools 20h ago
b9856
CUDA: consistent use of restrict + PDL for FA ( #25185 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
32 -
llama.cpp releases dev-tools 21h ago
b9855
ggml-cpu: add AVX2 optimization for nvfp4 dot product and use UE4M3 LUT ( #23961 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…
6 -
llama.cpp releases dev-tools 1d ago
b9853
ui: Remove PWA navigate fallback to prevent caching API endpoint requ…
7 -
llama.cpp releases dev-tools 1d ago
b9852
opencl: initial q1_0 support ( #25160 ) opencl: general q1_0 support opencl: add Adreno GEMM/GEMV for q1_0 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…
4 -
llama.cpp releases dev-tools 1d ago
b9851
cuda : prevent integer truncation and overflow errors when using KQ mask strides in flash_attn_mask_to_KV_max kernel ( #24945 ) Co-authored-by: Stanisław Szymczyk sszymczy@gmail.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED…
10 -
llama.cpp releases dev-tools 1d ago
b9850
model : register t_layer_inp for qwen3next ( #25141 ) Fix input assignment in layer processing loop Fix DFLASH for qwen-coder-next add line break Added tensor for attention normalization in Qwen3 model. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…
37 -
llama.cpp releases dev-tools 1d ago
b9849
common,server: handle bracketed IPv6 literals in URL authority ( #25140 ) common,server: handle bracketed IPv6 literals in URL authority Parse the [host]:port form (RFC 3986) and bracket IPv6 hosts when formatting a URL authority: listening log, proxy Host header, proxy log,…
5 -
llama.cpp releases dev-tools 1d ago
b9848
CUDA: fix get_rows_back for tables with more than 65535 rows (grid-y clamp + stride) ( #25103 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x…
9 -
llama.cpp releases dev-tools 1d ago
b9847
CUDA: fix Gemma E4B MTP FlashAttention ( #25148 ) CUDA: fix Gemma E4B MTP FlashAttention remove unused template declaration macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…
16 -
llama.cpp releases dev-tools 1d ago
b9846
vulkan: roll bk loop in matmul for asahi linux ( #24663 ) vulkan: roll bk loop in matmul for asahi linux vulkan: fix inline comment vulkan: revert BK-loop unroll change vulkan: edit spirv directly for asahi roll bk loop vulkan: remove trailing whitespace at the end of comments…
6 -
llama.cpp releases dev-tools 1d ago
b9844
ggml-webgpu: add support for NVFP4 ( #25143 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
19 -
llama.cpp releases dev-tools 2d ago
b9843
Revert "sched : reintroduce less synchronizations during split compute ( #20793 )" ( #25138 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x…
33 -
llama.cpp releases dev-tools 2d ago
b9842
common : dedup preset and cached model entries in /v1/models ( #25131 ) Signed-off-by: Adrien Gallouët angt@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…
29 -
llama.cpp releases dev-tools 2d ago
b9840
DeepSeek V4 ( #24162 ) convert: add dsv4 conversion add basic setup add llm_graph_input_dsv4 add save-load state add sinkhorn eps - correction by @fairydreaming add rope fix cleanup dead code fix bugs support pro model: added by @fairydreaming remove redundant V cache Chat…
26 -
llama.cpp releases dev-tools 2d ago
b9839
tools/ui: restore Tailwind scanning in ignored worktrees ( #24879 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
24 -
llama.cpp releases dev-tools 3d ago
b9838
common : remove unused regex-partial ( #25118 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
24 -
llama.cpp releases dev-tools 3d ago
b9837
jinja, chat: add --reasoning-preserve flag ( #25105 ) jinja, chat: add --reasoning-preserve flag correct help message macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…
28 -
llama.cpp releases dev-tools 3d ago
b9835
ui: fix stop and reasoning skip in single-model mode ( #25084 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
15 -
llama.cpp releases dev-tools 3d ago
b9833
chat : implement minicpm5 parser ( #24889 ) Add minicpm5 tool call parser Refactor MiniCPM5 PEG parser per review feedback Fix jinja min/max API to match Jinja2 modify by review MiniCPM5: use autoparser for XML tool calls and fix grammar preserved-token triggers MiniCPM5: fix…
26 -
llama.cpp releases dev-tools 3d ago
b9832
jinja: add --dump-prog for debugging ( #25086 ) jinja: add --dump-prog for debugging Update common/jinja/runtime.cpp Co-authored-by: Sigbjørn Skjæret 1629204+CISC@users.noreply.github.com Co-authored-by: Sigbjørn Skjæret 1629204+CISC@users.noreply.github.com macOS/iOS: macOS…
21 -
llama.cpp releases dev-tools 3d ago
b9831
spec : add DFlash support ( #22105 ) spec: add DFlash v2 support dflash: support sliding window attention per layer_types docs: add dflash section Co-authored-by: Kashif Rasul kashif.rasul@gmail.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…
12 -
llama.cpp releases dev-tools 3d ago
b9830
common : allow --offline in llama download ( #25091 ) Expose the existing --offline flag to llama download so a script can run it to check whether a model is already cached and ready to be served without touching the network. Also fix a latent use-after-free in the URL-task…
4 -
llama.cpp releases dev-tools 4d ago
b9829
logs : reduce v2 ( #25078 ) server : reduce logs cont : common cont : spec cont : CMN_ -> COM_ macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x…
11 -
llama.cpp releases dev-tools 4d ago
b9828
opencl: flash attention improvement ( #25069 ) opencl: rework FA kernel for f16 and f32 opencl: flash-attention prefill prepass kernels flash_attn_kv_pad_f16 pads the tail KV tile to a BLOCK_N multiple flash_attn_mask_pad_f16 pads the matching mask tile flash_attn_blk_f16…
13 -
llama.cpp releases dev-tools 4d ago
b9827
[CUDA] Added a cudaMemcpy2DAsync fast path to ggml_cuda_cpy ( #25057 ) [CUDA] Added a cudaMemcpy2DAsync fast path to ggml_cuda_cpy Add a CUDA ggml_cpy fast path for same-type, same-shape strided copies that are just 2D pitched block copies. When tensors are not fully contiguous…
14 -
llama.cpp releases dev-tools 4d ago
b9826
sycl : fix failed ut cases of norm ( #25044 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
13 -
llama.cpp releases dev-tools 4d ago
b9825
vulkan: fix step operator for 0 input ( #25036 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
9 -
llama.cpp releases dev-tools 4d ago
b9824
binaries : Improve rpc-server and export-graph-ops names. ( #25045 ) Tests are generally prefixed with -test, so rename export-graph-ops accordingly. rpc-server is probably too generic a name for /usr/bin. Because it should work with any ggml application, it is renamed to…
20 -
llama.cpp releases dev-tools 4d ago
b9823
ci : add windows-openvino to check-release ( #25022 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
11 -
llama.cpp releases dev-tools 4d ago
b9822
tests : fix test-chat-template --no-common option ( #25075 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
24 -
llama.cpp releases dev-tools 5d ago
b9821
app : allow --version, --licenses & --help ( #25054 ) Signed-off-by: Adrien Gallouët angt@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…
22 -
llama.cpp releases dev-tools 5d ago
b9820
sched : reintroduce less synchronizations during split compute ( #20793 ) CUDA: Improve performance via less synchronizations between token ( #17795 ) Adds CPU-to-CUDA copy capability to ggml_backend_cuda_cpy_tensor_async() Adds function to relax sync requirements between input…
26 -
llama.cpp releases dev-tools 5d ago
b9817
openvino: Update to OV 2026.2.1, self-contained release packages, operator improvements ( #24974 ) Update to OV 2026.2.1, Make OV release packages self-contained Update to OV 2026.2.1, Make OV release packages self-contained OpenVINO Backend: Remove compute_op_type hardcoded…
23 -
llama.cpp releases dev-tools 5d ago
b9816
sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64…
35 -
llama.cpp releases dev-tools 5d ago
b9814
vulkan: opt mul_mat_vecq for mi50 ( #22933 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
29 -
llama.cpp releases dev-tools 5d ago
b9813
vulkan: add INTEL_XE1 arch enum and enable coopmat1 on Intel Xe-LPG Plus ( #24404 ) vulkan: add INTEL_PRE_XE2 arch enum and enable coopmat1 on Intel Xe-LPG Plus (1/3, Xe1-ARLH) Co-authored-by: Xia, Jie jie.xia@intel.com Co-authored-by: Liu, Russell russell.liu@intel.com Address…
23 -
llama.cpp releases dev-tools 5d ago
b9811
vulkan: Workaround compiler bug in conv2d coopmat2 path ( #24924 ) vulkan: Workaround compiler bug in conv2d coopmat2 path apply same workaround to CONV_3D Apply suggestion from @jeffbolznv macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…
38 -
llama.cpp releases dev-tools 5d ago
b9810
CUDA: add cublasSgemmBatched mapping for HIP/MUSA vendor headers ( #25033 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
31 -
llama.cpp releases dev-tools 6d ago
b9804
mamba2: remove hardcoded 2x expansion factor and invalid d_inner % d_state check ( #23082 ) mamba2: remove hardcoded 2x expansion factor, support any expand value mamba2: remove invalid d_inner %% d_state check (unrelated parameters) Update convert_hf_to_gguf.py: make expand…
21 -
llama.cpp releases dev-tools 6d ago
b9803
opencl: flush profiling batch at shutdown for incomplete batches ( #25016 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
7 -
llama.cpp releases dev-tools 6d ago
b9802
macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO)…
15 -
llama.cpp releases dev-tools 6d ago
b9789
quant : fix quantizing moe with mtp ( #24986 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
5 -
llama.cpp releases dev-tools 6d ago
b9788
sycl : support --split-mode tensor ( #24152 ) Sycl tp stage1 ( #1 ) SYCL: tensor parallelism (--split-mode tensor) for dual-GPU Adds the comm_init/comm_free/comm_allreduce_tensor trio that the meta-backend queries via get_proc_address to enable backend-specific all-reduce,…
33 -
llama.cpp releases dev-tools 7d ago
b9786
opencl: support non-contig rows in norm ( #24965 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
8 -
llama.cpp releases dev-tools 7d ago
b9785
chat: harden caps check ( #24973 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…
23 -
llama.cpp releases dev-tools 7d ago
b9784
hexagon: MUL_MAT and MUL_MAT_ID rework : 32x32 tiled weight repack, kernel-params, cached graphs ( #24954 ) hex-mm: new weight layout and fusion updates hvx-mm: unroll the new tiled vec_dots to optimize hvx register util hex-mm: optimize dyn.quant format for q8_0 and q8_1 to…
36