llama.cpp releases
469 articles archived · Visit source ↗ · RSS
-
llama.cpp releases dev-tools 1mo ago
b9436
opencl: support bf16 by converting to f16 ( #23839 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
18 -
llama.cpp releases dev-tools 1mo ago
b9434
TP: fix granularity for Qwen 3.5/3.6 + 3 GPUs ( #23843 ) TP: fix granularity for Qwen 3.5/3.6 + 3 GPUs fix afmoe TP macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…
25 -
llama.cpp releases dev-tools 1mo ago
b9433
metal : restore im2col implementation for large kernels ( #23901 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
31 -
llama.cpp releases dev-tools 1mo ago
b9432
test: (test-llama-archs) log the config name first ( #23885 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
37 -
llama.cpp releases dev-tools 1mo ago
b9431
ci : update ios-xcode release job to macos-26 ( #23906 ) ci : disable libcommon build from xcframework ocd : fix name ci : ios-xcode change to macos-26 cont : pin xcode cont : pin xcode to minor version macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…
34 -
llama.cpp releases dev-tools 1mo ago
b9430
ggml : add some lsx support ( #23798 ) loongarch : optimize LSX fp16 load/store with native intrinsics Use __lsx_vfcvtl_s_h and __lsx_vfcvt_h_s instead of scalar loops in __lsx_f16x4_load and __lsx_f16x4_store. loongarch : add LSX implementation for q8_0 dot product loongarch :…
21 -
llama.cpp releases dev-tools 1mo ago
b9428
ci : fix s390x release job ( #23898 ) ci : fix s390x release job ci : multi-thread build for ios-xcode ocd : names macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64…
6 -
llama.cpp releases dev-tools 1mo ago
b9426: llama : do not skip iGPU when only RPC devices are present (#23868)
After #23007 reclassified integrated CUDA/HIP devices as IGPU, the device selection logic dropped the local iGPU whenever any RPC server was added, because RPC devices made model->devices non-empty. On systems where the "iGPU" is the main compute device (e.g. Strix Halo with 128…
12 -
llama.cpp releases dev-tools 1mo ago
b9415
download: add option to skip_download ( #23059 ) download: add option to skip_download fix fix 2 if file doesn't exist, respect skip_download flag macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework…
16 -
llama.cpp releases dev-tools 1mo ago
b9414
mtmd: Add DeepSeekOCR 2 Support ( #20975 ) mtmd: DeepSeek-OCR 2 support, with multi-tile dynamic resolution introduced clip_image_f32::add_viewsep address PR review drop redundant ggml_cpy ops in both deepseekocr versions build drop no-op ggml_cont in build_sam assert…
30 -
llama.cpp releases dev-tools 1mo ago
b9413
CUDA: Check PTX version on host side to guard PDL dispatch ( #23530 ) CUDA: Check PTX version on host side to guard PDL dispatch Checking on __CUDA_ARCH_LIST__ alone is insufficient for JIT, as this variable doesn't differentiate between compiling for say sm_90, sm_90a or sm_90f…
26 -
llama.cpp releases dev-tools 1mo ago
b9412
server: bump timeout to 3600s ( #23842 ) server: bump timeout to 3600s nits: change wording macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x…
32 -
llama.cpp releases dev-tools 1mo ago
b9411
model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation ( #23346 ) llama : support DeepSeek V3.2 model family (with DSA lightning indexer) convert : handle DeepseekV32ForCausalLM architecture ggml : support for f16 GGML_OP_FILL…
34 -
llama.cpp releases dev-tools 1mo ago
b9410
llama: use f16 mask for FA to save VRAM ( #23764 ) llama: use f16 mask for FA review: add llama_cast + formatting simplify macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…
7 -
llama.cpp releases dev-tools 1mo ago
b9409
sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64…
7 -
llama.cpp releases dev-tools 1mo ago
b9406
llama: add llm_graph_input_mtp ( #23643 ) llama: add llm_graph_input_mtp rename input_mtp -> input_token_embd add TODO about mtmd embedding cont : clean-up Co-authored-by: Georgi Gerganov ggerganov@gmail.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…
38 -
llama.cpp releases dev-tools 1mo ago
b9405
app : move licences to llama-app ( #23824 ) Signed-off-by: Adrien Gallouët angt@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x…
4 -
llama.cpp releases dev-tools 1mo ago
b9403
meta : Add missing buffer set in allreduce fallback !COMPUTE clear ( #23480 ) Without this at least the vulkan backend will skip the * 0 for !COMPUTE tensors, causing corrupt output. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED…
26 -
llama.cpp releases dev-tools 1mo ago
b9402
hexagon: basic/generic op fusion support and RMS_NORM+MUL fusion ( #23835 ) Updating infra to enable op fusion and using RMS_NORM+MUL as the use-case. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework…
17 -
llama.cpp releases dev-tools 1mo ago
b9401
mtmd-debug: add color and rainbow mode ( #23829 ) mtmd-debug: add color and rainbow mode fix M_PI max_dist macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…
32 -
llama.cpp releases dev-tools 1mo ago
b9400
mtmd: fix gemma 4 projector pre_norm ( #23822 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
22 -
llama.cpp releases dev-tools 1mo ago
b9399
opencl: move backend info printing into its own function ( #23702 ) opencl: move backend info print into its own function opencl: move new log line opencl: fix for non adreno path macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS…
20 -
llama.cpp releases dev-tools 1mo ago
b9404
cuda : disables launch_fattn PDL enrollment due to compiler bug ( #23825 )
25 -
llama.cpp releases dev-tools 1mo ago
b9395
app : improve help output ( #23805 ) Signed-off-by: Adrien Gallouët angt@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…
25 -
llama.cpp releases dev-tools 1mo ago
b9394
mtmd: n_head_kv defaults to n_head ( #23782 ) removed AI-generated comment macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
9 -
llama.cpp releases dev-tools 1mo ago
b9393
mtmd: fix gemma 4 audio rms norm eps ( #23815 ) mtmd: fix gemma 4 audio rms norm eps Update tools/mtmd/clip.cpp Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com macOS/iOS: macOS Apple Silicon (arm64) macOS…
34 -
llama.cpp releases dev-tools 1mo ago
b9391
arg: Add LLAMA_ARG_API_KEY_FILE environment variable for --api-key-file ( #23167 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…
4 -
llama.cpp releases dev-tools 1mo ago
b9389
ggml: auto apply iGPU flag CUDA/HIP if integrated device ( #23007 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
27 -
llama.cpp releases dev-tools 1mo ago
b9388
mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for … ( #23729 ) mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for SM75 TURING avoid a mismatch for JIT compilation of Turing device code for Ampere or newer Co-authored-by: Johannes Gäßler…
38 -
llama.cpp releases dev-tools 1mo ago
b9387
CUDA: route batch>=4 quantized matmul to MMQ on AMD MFMA hardware ( #23227 ) CUDA: per-quant MMVQ/MMQ batch threshold on AMD MFMA hardware The dispatcher uses a single global threshold (MMVQ_MAX_BATCH_SIZE = 8) to choose between mul_mat_vec_q (per-row GEMV) and mul_mat_q…
38 -
llama.cpp releases dev-tools 1mo ago
b9386
server: minor tweaks to use more cpp features ( #23785 ) misc(server): add default port to impl RAII misc(server): register_gcp_compat() can be const misc(server): use proper cpp const/auto methods misc(server): do not reset a unique_ptr, use make_unique instead to be exception…
34 -
llama.cpp releases dev-tools 1mo ago
b9384
vulkan: fast path for walsh-hadamard transform ( #23687 ) vulkan: fast path for walsh-hadamard transform disable for intel due to segfault macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux:…
16 -
llama.cpp releases dev-tools 1mo ago
b9383
chat : add Granite 4.1 chat template ( #23518 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
38 -
llama.cpp releases dev-tools 1mo ago
b9382
vulkan: fix wrong index variable in inner loop ( #23665 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
7 -
llama.cpp releases dev-tools 1mo ago
b9381
vulkan: Fix memory logger unsafe iterator access ( #23667 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
36 -
llama.cpp releases dev-tools 1mo ago
b9380
server, ui : Add support for HTTP ETags in llama-server ( #23701 ) allow caching of ui elements in llama-server use fnv_hash Update tools/server/server-http.cpp etag has to be set always Co-authored-by: Xuan-Son Nguyen thichthat@gmail.com Co-authored-by: Xuan-Son Nguyen…
4 -
llama.cpp releases dev-tools 1mo ago
b9378
cuda : fix KQ mask offset integer overflow in fattn MMA kernel ( #23610 ) Co-authored-by: Stanisław Szymczyk sszymczy@gmail.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…
32 -
llama.cpp releases dev-tools 1mo ago
b9377
perplexity : fix format specifier in LOG_ERR ( #23788 ) Signed-off-by: Adrien Gallouët angt@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…
37 -
llama.cpp releases dev-tools 1mo ago
b9375
ggml: fixed Arm SVE usage bug in vec.h, vec.cpp ( #22841 ) Updated vec.h/vec.cpp code to accumulate to F32 rather than F16 Change-Id: I0cb789347f2bf60ffaf9047319f727e788c825f8 Signed-off-by: Martin Klacer martin.klacer@arm.com Co-authored-by: Milos Puzovic Milos.Puzovic@arm.com…
28 -
llama.cpp releases dev-tools 1mo ago
b9374
ci : refactor ( #23789 ) ci : separate CUDA windows workflow + fix names ci : rename workflow ci : prefix cache names with workflow name ci : rename build.yml -> build-cpu.yml ci : cache keys ci : fix windows cuda/hip concurrency of release workflow ci : fix apple cache names ci…
15 -
llama.cpp releases dev-tools 1mo ago
b9371
ggml-webgpu: remove legacy constants ( #23672 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
17 -
llama.cpp releases dev-tools 1mo ago
b9370
hexagon: add support for Q4_1 in MUL_MAT and MUL_MAT_ID ( #23647 ) hex-mm: add support for Q4_1 matmul/matvec, hvx-only for now hmx-mm: add support for Q4_1 hex-mm: use Q8_1 dynamic quantization to avoid having to compute sums in the vec_dot hexagon: fix repack scratch buffer…
6 -
llama.cpp releases dev-tools 1mo ago
b9368
vulkan: Switch MUL_MAT_VEC to 4 K per iteration for F16/32 ( #22887 ) vulkan: Switch MUL_MAT_VEC to 4 K per iteration for F16/32 Against mesa git, this shows a 4.8% performance improvement for tg128 on Qwen3.5-9B:BF16 on Intel BMG. Note that this breaks some tests until the last…
34 -
llama.cpp releases dev-tools 1mo ago
b9369
ggml-webgpu: Fix how to dispatch WG to some ops ( #23750 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
31 -
llama.cpp releases dev-tools 1mo ago
b9367
vulkan: use GL_NV_cooperative_matrix_decode_vector for faster matmul ( #23541 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…
28 -
llama.cpp releases dev-tools 1mo ago
b9366
vulkan: add REPEAT op support for f16 to f16. ( #23298 ) feat: extend repeat op for vulkan feat: add repeat_f16 vulkan pipeline fix: ensure same dst and src types fix: use type_size instead of data types fix: use int16 and int32 for repeat shader op chore: rename repeat_f* to…
5 -
llama.cpp releases dev-tools 1mo ago
b9365
ci : move ARM jobs to self-hosted + disable kleidiai mac release ( #23780 ) ci : move ARM jobs to 3rd-party runners + disable kleidiai release cont : fix deps + fix names ocd : fix names cont : fix PR links macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…
33 -
llama.cpp releases dev-tools 1mo ago
b9360
common : fix env names to all have LLAMA_ARG_ prefix ( #23778 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
7 -
llama.cpp releases dev-tools 1mo ago
b9357
vulkan: avoid preferring transfer queue on AMD UMA devices ( #22455 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
38 -
llama.cpp releases dev-tools 1mo ago
b9354
convert: add MiniCPM5 tokenizer support ( #23384 ) Add minicpm5 pre-tokenizer hash via convert_hf_to_gguf_update.py and implement hardcoded regex handling in llama-vocab.cpp, consistent with other BPE pre-tokenizers. Co-authored-by: zhangtao zhangtao2@modelbest.cn macOS/iOS:…
11