llama.cpp releases
469 articles archived · Visit source ↗ · RSS
-
llama.cpp releases dev-tools 1mo ago
b9257
vulkan: optimize operations in the IM2COL shader ( #22685 ) vulkan: optimize operations in the IM2COL shader Add comments and improve the code formatting macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux:…
23 -
llama.cpp releases dev-tools 1mo ago
b9255
hexagon: HMX quantized matmul rework ( #23368 ) hmx-mm: update debug logging in hmx-mm hmx-mm: update dequant logic to use HVX_vector_x2/4 hmx-mm: remove non-pipelined version of the quantize matmul It seems that we don't reall need non-pipelined version hmx-mm: use activation…
36 -
llama.cpp releases dev-tools 1mo ago
b9254
Programmatic Dependent Launch (PDL) for more performance on newer NVIDIA GPUs (Hopper+) ( #22522 ) Adds initial PDL setup. Adds PDL barriers based on simple heuristic: place "sync" before first input pointer access, and "launch" after last write, e.g. to tensors like dst.…
17 -
llama.cpp releases dev-tools 1mo ago
b9253
app : introduce the llama unified executable ( #23296 ) app : introduce the llama unified executable Signed-off-by: Adrien Gallouët angt@huggingface.co Use serve for server Signed-off-by: Adrien Gallouët angt@huggingface.co Hide completion and bench, add help command…
26 -
llama.cpp releases dev-tools 1mo ago
b9251
mtmd: fit_params now take into account mmproj ( #21489 ) mtmd: fit_params now take into account mmproj rename alloc_compute_meta to reserve_compute_meta rm unused functions add ggml_backend_dev_t support add debug log macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…
23 -
llama.cpp releases dev-tools 1mo ago
b9247
metal : optimize pad + cpy ( #23354 ) metal : optimize pad metal : optinmize cpy cont : better row packing in threadgroup macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64…
29 -
llama.cpp releases dev-tools 1mo ago
b9245
ggml-cuda: tune RDNA3 Q6_K MMVQ nwarps ( #23349 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…
35 -
llama.cpp releases dev-tools 1mo ago
b9244
opencl: add MoE support for q4_k, q5_k, q6_k on Adreno ( #23303 ) opencl: add q4_k moe support opencl: add q5_k moe support opencl: add q6_k moe support opencl: adjust format Co-authored-by: Li He lih@qti.qualcomm.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…
33 -
llama.cpp releases dev-tools 1mo ago
b9243
hexagon: add MROPE and IMROPE support in HTP rope op ( #23317 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
18 -
llama.cpp releases dev-tools 1mo ago
b9235
llama : MTP clean-up ( #23269 ) llama : disable equal splits for recurrent memory with partial rollback spec : re-enable p-min with MTP drafts spec : re-enable ngram spec in combination with RS rollback spec : fix ngram-map-* params spec : fix acceptance logic in combined ngram…
27 -
llama.cpp releases dev-tools 1mo ago
b9240
common: fix --help for --verbosity ( #23278 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…
5 -
llama.cpp releases dev-tools 1mo ago
b9239
common: fix --fit verbosity with --verbosity 4 ( #23282 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
35 -
llama.cpp releases dev-tools 1mo ago
b9222
hexagon: add support for TRI op ( #22822 ) Hexagon: TRI HVX Kernel addition to ggml hexagon HTP ops and context addressed PR review comments for TRI op hexagon: clang format hex-unary: remove merge conflict markers hex-ggml: remove duplicate op cases (merge conflict) hex-ggml:…
36 -
llama.cpp releases dev-tools 1mo ago
b9221
ggml-hexagon: add PAD op HVX kernel ( #23078 ) ggml-hexagon: add PAD op HVX kernel Implements GGML_OP_PAD on the Hexagon HTP backend using HVX vectorized kernels. Supports zero-padding and circular padding across all 4 tensor dimensions. hex-ggml: remove duplicate op cases…
26 -
llama.cpp releases dev-tools 1mo ago
b9219
common : remove hf cache migration ( #23266 ) Signed-off-by: Adrien Gallouët angt@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…
20 -
llama.cpp releases dev-tools 1mo ago
b9216
ui: Refactor models store, MCP service, and gate logs behind VITE_DEBUG ( #23236 ) refactor: Scope console logs to DEV + VITE_DEBUG env vars refactor: skip MCP proxy probe when no server requires it refactor: suppress expected disconnect errors during MCP client shutdown…
33 -
llama.cpp releases dev-tools 1mo ago
b9213
llama: initialize pre-norm embedding mask flag ( #23256 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
21 -
llama.cpp releases dev-tools 1mo ago
b9208
sycl: route small f32 matmuls to oneMKL, bypass oneDNN ( #22150 ) Signed-off-by: Chun Tao chun.tao@intel.com Co-authored-by: Chun Tao chun.tao@intel.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux:…
18 -
llama.cpp releases dev-tools 1mo ago
b9209: sycl: scalar SWAR byte-subtract in Q6_K MMVQ dot product (#22156)
Signed-off-by: Chun Tao chun.tao@intel.com Co-authored-by: Chun Tao chun.tao@intel.com
11 -
llama.cpp releases dev-tools 1mo ago
b9204
feat: Support d_conv=15 for ssm-conv.cu ( #23017 ) Branch: ModalityConditionalAdapters AI-usage: none Signed-off-by: Gabe Goodhart ghart@us.ibm.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux:…
21 -
llama.cpp releases dev-tools 1mo ago
b9203
cmake : fix LLAMA_BUILD_UI logic ( #23190 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…
4 -
llama.cpp releases dev-tools 1mo ago
b9202
cmake : do not install conversion script ( #23204 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
5 -
llama.cpp releases dev-tools 1mo ago
b9200
llama: avoid copying logits during prompt decode in MTP ( #23198 ) llama: avoid copying logits during prompt decode in MTP review: update comment llama-graph: call set_output for t_h_pre_norm macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…
10 -
llama.cpp releases dev-tools 1mo ago
b9198
ggml-vulkan/CMakeLists: add a check for SPIRV-Headers ( #22009 ) ci/run: set explicit SPIR-V Headers search path for macOS vulkan CI For whatever reason, the files are under additional sub-path vulkan/ under the cmake directory, which does not match either current LunarG macOS…
8 -
llama.cpp releases dev-tools 1mo ago
b9197
vulkan: add cpy bf16 -> f32 pipelines ( #22677 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…
23 -
llama.cpp releases dev-tools 1mo ago
b9196
vulkan: Support unaligned tensors for ROPE ( #22637 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
36 -
llama.cpp releases dev-tools 1mo ago
b9194
vulkan: fuse SSM_CONV + BIAS + SILU ( #22653 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…
34 -
llama.cpp releases dev-tools 1mo ago
b9193
server : honor --embd-normalize CLI arg ( #23125 ) The --embd-normalize flag was registered only for the embedding and debug examples, so llama-server rejected it and the /embedding handler used a hard-coded default of 2 (L2). Add LLAMA_EXAMPLE_SERVER to the flag's example set…
7 -
llama.cpp releases dev-tools 1mo ago
b9192
ngram : reduce noisy logs ( #23185 ) ngram : reduce noisy logs ngram : reduce noisy logs macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…
19 -
llama.cpp releases dev-tools 1mo ago
b9190
server: (router) alloc tmp buffer on heap ( #23159 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
16 -
llama.cpp releases dev-tools 1mo ago
b9189
server: skip device enumeration in router mode to avoid creating CUDA primary context ( #23137 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…
7 -
llama.cpp releases dev-tools 1mo ago
b9186
sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO)…
15 -
llama.cpp releases dev-tools 1mo ago
b9181
vendor : update cpp-httplib to 0.45.0 ( #23103 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…
12 -
llama.cpp releases dev-tools 1mo ago
b9180
llama + spec: MTP Support ( #22673 ) spec: support MTP fix batch size rename files cont : simplify ( #7 ) MTP: clean-up ( #9 ) MTP: clean-up review: use llama_context_type instead of llama_graph_type review: remove llama_model_has_mtp review: fix convert issues convert: fix…
37 -
llama.cpp releases dev-tools 1mo ago
b9174
ui: Restructure repo to use tools/ui folder and ui / UI / llama-ui / LLAMA_UI naming ( #23064 ) webui: Move static build output from tools/server/public to build/ui directory refactor: Move to tools/ui refactor: rename CMake variables and preprocessor defines Rename…
36 -
llama.cpp releases dev-tools 1mo ago
b9173
ci : fix release symlinks ( #23119 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm…
33 -
llama.cpp releases dev-tools 1mo ago
b9172
webui: Use lowercase hash for HF checksum check ( #23107 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
24 -
llama.cpp releases dev-tools 1mo ago
b9169
mtmd: add chunks and fix preproc for qwen3a ( #23073 ) mtmd: add chunks and fix preproc for qwen3a add attn_mask limit mtmd_chunk size (avoid blow up memory) correct audio tokens re-order the set_input case remove attn_mask macOS/iOS: macOS Apple Silicon (arm64) macOS Apple…
7 -
llama.cpp releases dev-tools 1mo ago
b9165
ci : fix transform of top . entry in release archive ( #23080 ) fix transform of top . entry in release archive simplify macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64…
11 -
llama.cpp releases dev-tools 1mo ago
b9163
reasoning-budget: clone should do a deep-copy ( #23095 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
36 -
llama.cpp releases dev-tools 1mo ago
b9161
Support for Codex CLI by skipping unsupported Responses tools ( #23041 ) Support for Codex CLI by skipping unsupported Responses tools Warn on skipped Responses tools and preserve gpt-oss apply_patch rejection Revert gpt-oss apply_patch special handling macOS/iOS: macOS Apple…
29 -
llama.cpp releases dev-tools 1mo ago
b9159
ggml-hexagon: cpy: add contiguous fast-path in reshape copy ( #23076 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
4 -
llama.cpp releases dev-tools 1mo ago
b9158
HIP: RDNA3 mma FA, faster AMD transpose, tune AMD ( #22880 ) Adds RDNA3 support to the CUDA mma FA kernel. To make the RDNA3 tensor cores work with the FP16 accumulation for VKQ the tiles they need to be 32 logical units long in direction of the attention head; for head sizes 80…
25 -
llama.cpp releases dev-tools 1mo ago
b9156
ggml-webgpu: Enable NVIDIA self-hosted CI ( #22976 ) Enabel nvidia ci for webgpu Address precision issues fix placement Relax more set_rows and div Try relaxing all f16 formatting and naming Add comment explaining max_nmse_err logic Added comment referencing pull request for…
21 -
llama.cpp releases dev-tools 1mo ago
b9151
logs : reduce ( #23021 ) logs : reduce args : fix envs server : fix build common : print verbosity level at start server : clean-up logs server : print prompt processing timings + sampling params minor : whitespaces macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…
8 -
llama.cpp releases dev-tools 1mo ago
b9150
ggml-cpu: Add IME2 Instruction Support for the SpacemiT Backend ( #22863 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
20 -
llama.cpp releases dev-tools 1mo ago
b9148
unicode,test: add Qwen3.5 non-backtracking tokenizer handler and regr… ( #22110 ) unicode,test: add Qwen3.5 non-backtracking tokenizer handler and regression tests Add unicode_regex_split_custom_qwen35() to src/unicode.cpp , a non-backtracking handler for Qwen3.5's [\p{L}\p{M}]+…
18 -
llama.cpp releases dev-tools 1mo ago
b9145
SYCL: fix multi-GPU system RAM exhaustion by using Level Zero allocations ( #21597 ) SYCL: fix multi-GPU system RAM exhaustion by using Level Zero allocations Replace sycl::malloc_device with zeMemAllocDevice for GPU memory allocation in the SYCL backend. sycl::malloc_device…
6