Home Status News MCP Pricing Sign in

Home Status News MCP Pricing Sign in

News / llama.cpp releases

llama.cpp releases

469 articles archived · Visit source ↗ · RSS

Sign in to subscribe

llama.cpp releases dev-tools 1mo ago

b9436

opencl: support bf16 by converting to f16 ( #23839 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

18
llama.cpp releases dev-tools 1mo ago

b9434

TP: fix granularity for Qwen 3.5/3.6 + 3 GPUs ( #23843 ) TP: fix granularity for Qwen 3.5/3.6 + 3 GPUs fix afmoe TP macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…

25
llama.cpp releases dev-tools 1mo ago

b9433

metal : restore im2col implementation for large kernels ( #23901 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

31
llama.cpp releases dev-tools 1mo ago

b9432

test: (test-llama-archs) log the config name first ( #23885 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

37
llama.cpp releases dev-tools 1mo ago

b9431

ci : update ios-xcode release job to macos-26 ( #23906 ) ci : disable libcommon build from xcframework ocd : fix name ci : ios-xcode change to macos-26 cont : pin xcode cont : pin xcode to minor version macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…

34
llama.cpp releases dev-tools 1mo ago

b9430

ggml : add some lsx support ( #23798 ) loongarch : optimize LSX fp16 load/store with native intrinsics Use __lsx_vfcvtl_s_h and __lsx_vfcvt_h_s instead of scalar loops in __lsx_f16x4_load and __lsx_f16x4_store. loongarch : add LSX implementation for q8_0 dot product loongarch :…

21
llama.cpp releases dev-tools 1mo ago

b9428

ci : fix s390x release job ( #23898 ) ci : fix s390x release job ci : multi-thread build for ios-xcode ocd : names macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64…

6
llama.cpp releases dev-tools 1mo ago

b9426: llama : do not skip iGPU when only RPC devices are present (#23868)

After #23007 reclassified integrated CUDA/HIP devices as IGPU, the device selection logic dropped the local iGPU whenever any RPC server was added, because RPC devices made model->devices non-empty. On systems where the "iGPU" is the main compute device (e.g. Strix Halo with 128…

12
llama.cpp releases dev-tools 1mo ago

b9415

download: add option to skip_download ( #23059 ) download: add option to skip_download fix fix 2 if file doesn't exist, respect skip_download flag macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework…

16
llama.cpp releases dev-tools 1mo ago

b9414

mtmd: Add DeepSeekOCR 2 Support ( #20975 ) mtmd: DeepSeek-OCR 2 support, with multi-tile dynamic resolution introduced clip_image_f32::add_viewsep address PR review drop redundant ggml_cpy ops in both deepseekocr versions build drop no-op ggml_cont in build_sam assert…

30
llama.cpp releases dev-tools 1mo ago

b9413

CUDA: Check PTX version on host side to guard PDL dispatch ( #23530 ) CUDA: Check PTX version on host side to guard PDL dispatch Checking on __CUDA_ARCH_LIST__ alone is insufficient for JIT, as this variable doesn't differentiate between compiling for say sm_90, sm_90a or sm_90f…

26
llama.cpp releases dev-tools 1mo ago

b9412

server: bump timeout to 3600s ( #23842 ) server: bump timeout to 3600s nits: change wording macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x…

32
llama.cpp releases dev-tools 1mo ago

b9411

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation ( #23346 ) llama : support DeepSeek V3.2 model family (with DSA lightning indexer) convert : handle DeepseekV32ForCausalLM architecture ggml : support for f16 GGML_OP_FILL…

34
llama.cpp releases dev-tools 1mo ago

b9410

llama: use f16 mask for FA to save VRAM ( #23764 ) llama: use f16 mask for FA review: add llama_cast + formatting simplify macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…

7
llama.cpp releases dev-tools 1mo ago

b9409

sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64…

7
llama.cpp releases dev-tools 1mo ago

b9406

llama: add llm_graph_input_mtp ( #23643 ) llama: add llm_graph_input_mtp rename input_mtp -> input_token_embd add TODO about mtmd embedding cont : clean-up Co-authored-by: Georgi Gerganov ggerganov@gmail.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…

38
llama.cpp releases dev-tools 1mo ago

b9405

app : move licences to llama-app ( #23824 ) Signed-off-by: Adrien Gallouët angt@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x…

4
llama.cpp releases dev-tools 1mo ago

b9403

meta : Add missing buffer set in allreduce fallback !COMPUTE clear ( #23480 ) Without this at least the vulkan backend will skip the * 0 for !COMPUTE tensors, causing corrupt output. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED…

26
llama.cpp releases dev-tools 1mo ago

b9402

hexagon: basic/generic op fusion support and RMS_NORM+MUL fusion ( #23835 ) Updating infra to enable op fusion and using RMS_NORM+MUL as the use-case. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework…

17
llama.cpp releases dev-tools 1mo ago

b9401

mtmd-debug: add color and rainbow mode ( #23829 ) mtmd-debug: add color and rainbow mode fix M_PI max_dist macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…

32
llama.cpp releases dev-tools 1mo ago

b9400

mtmd: fix gemma 4 projector pre_norm ( #23822 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

22
llama.cpp releases dev-tools 1mo ago

b9399

opencl: move backend info printing into its own function ( #23702 ) opencl: move backend info print into its own function opencl: move new log line opencl: fix for non adreno path macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS…

20
llama.cpp releases dev-tools 1mo ago

b9404

cuda : disables launch_fattn PDL enrollment due to compiler bug ( #23825 )

25
llama.cpp releases dev-tools 1mo ago

b9395

app : improve help output ( #23805 ) Signed-off-by: Adrien Gallouët angt@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…

25
llama.cpp releases dev-tools 1mo ago

b9394

mtmd: n_head_kv defaults to n_head ( #23782 ) removed AI-generated comment macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

9
llama.cpp releases dev-tools 1mo ago

b9393

mtmd: fix gemma 4 audio rms norm eps ( #23815 ) mtmd: fix gemma 4 audio rms norm eps Update tools/mtmd/clip.cpp Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com macOS/iOS: macOS Apple Silicon (arm64) macOS…

34
llama.cpp releases dev-tools 1mo ago

b9391

arg: Add LLAMA_ARG_API_KEY_FILE environment variable for --api-key-file ( #23167 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…

4
llama.cpp releases dev-tools 1mo ago

b9389

ggml: auto apply iGPU flag CUDA/HIP if integrated device ( #23007 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

27
llama.cpp releases dev-tools 1mo ago

b9388

mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for … ( #23729 ) mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for SM75 TURING avoid a mismatch for JIT compilation of Turing device code for Ampere or newer Co-authored-by: Johannes Gäßler…

38
llama.cpp releases dev-tools 1mo ago

b9387

CUDA: route batch>=4 quantized matmul to MMQ on AMD MFMA hardware ( #23227 ) CUDA: per-quant MMVQ/MMQ batch threshold on AMD MFMA hardware The dispatcher uses a single global threshold (MMVQ_MAX_BATCH_SIZE = 8) to choose between mul_mat_vec_q (per-row GEMV) and mul_mat_q…

38
llama.cpp releases dev-tools 1mo ago

b9386

server: minor tweaks to use more cpp features ( #23785 ) misc(server): add default port to impl RAII misc(server): register_gcp_compat() can be const misc(server): use proper cpp const/auto methods misc(server): do not reset a unique_ptr, use make_unique instead to be exception…

34
llama.cpp releases dev-tools 1mo ago

b9384

vulkan: fast path for walsh-hadamard transform ( #23687 ) vulkan: fast path for walsh-hadamard transform disable for intel due to segfault macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux:…

16
llama.cpp releases dev-tools 1mo ago

b9383

chat : add Granite 4.1 chat template ( #23518 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

38
llama.cpp releases dev-tools 1mo ago

b9382

vulkan: fix wrong index variable in inner loop ( #23665 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

7
llama.cpp releases dev-tools 1mo ago

b9381

vulkan: Fix memory logger unsafe iterator access ( #23667 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

36
llama.cpp releases dev-tools 1mo ago

b9380

server, ui : Add support for HTTP ETags in llama-server ( #23701 ) allow caching of ui elements in llama-server use fnv_hash Update tools/server/server-http.cpp etag has to be set always Co-authored-by: Xuan-Son Nguyen thichthat@gmail.com Co-authored-by: Xuan-Son Nguyen…

4
llama.cpp releases dev-tools 1mo ago

b9378

cuda : fix KQ mask offset integer overflow in fattn MMA kernel ( #23610 ) Co-authored-by: Stanisław Szymczyk sszymczy@gmail.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…

32
llama.cpp releases dev-tools 1mo ago

b9377

perplexity : fix format specifier in LOG_ERR ( #23788 ) Signed-off-by: Adrien Gallouët angt@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…

37
llama.cpp releases dev-tools 1mo ago

b9375

ggml: fixed Arm SVE usage bug in vec.h, vec.cpp ( #22841 ) Updated vec.h/vec.cpp code to accumulate to F32 rather than F16 Change-Id: I0cb789347f2bf60ffaf9047319f727e788c825f8 Signed-off-by: Martin Klacer martin.klacer@arm.com Co-authored-by: Milos Puzovic Milos.Puzovic@arm.com…

28
llama.cpp releases dev-tools 1mo ago

b9374

ci : refactor ( #23789 ) ci : separate CUDA windows workflow + fix names ci : rename workflow ci : prefix cache names with workflow name ci : rename build.yml -> build-cpu.yml ci : cache keys ci : fix windows cuda/hip concurrency of release workflow ci : fix apple cache names ci…

15
llama.cpp releases dev-tools 1mo ago

b9371

ggml-webgpu: remove legacy constants ( #23672 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

17
llama.cpp releases dev-tools 1mo ago

b9370

hexagon: add support for Q4_1 in MUL_MAT and MUL_MAT_ID ( #23647 ) hex-mm: add support for Q4_1 matmul/matvec, hvx-only for now hmx-mm: add support for Q4_1 hex-mm: use Q8_1 dynamic quantization to avoid having to compute sums in the vec_dot hexagon: fix repack scratch buffer…

6
llama.cpp releases dev-tools 1mo ago

b9368

vulkan: Switch MUL_MAT_VEC to 4 K per iteration for F16/32 ( #22887 ) vulkan: Switch MUL_MAT_VEC to 4 K per iteration for F16/32 Against mesa git, this shows a 4.8% performance improvement for tg128 on Qwen3.5-9B:BF16 on Intel BMG. Note that this breaks some tests until the last…

34
llama.cpp releases dev-tools 1mo ago

b9369

ggml-webgpu: Fix how to dispatch WG to some ops ( #23750 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

31
llama.cpp releases dev-tools 1mo ago

b9367

vulkan: use GL_NV_cooperative_matrix_decode_vector for faster matmul ( #23541 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…

28
llama.cpp releases dev-tools 1mo ago

b9366

vulkan: add REPEAT op support for f16 to f16. ( #23298 ) feat: extend repeat op for vulkan feat: add repeat_f16 vulkan pipeline fix: ensure same dst and src types fix: use type_size instead of data types fix: use int16 and int32 for repeat shader op chore: rename repeat_f* to…

5
llama.cpp releases dev-tools 1mo ago

b9365

ci : move ARM jobs to self-hosted + disable kleidiai mac release ( #23780 ) ci : move ARM jobs to 3rd-party runners + disable kleidiai release cont : fix deps + fix names ocd : fix names cont : fix PR links macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…

33
llama.cpp releases dev-tools 1mo ago

b9360

common : fix env names to all have LLAMA_ARG_ prefix ( #23778 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

7
llama.cpp releases dev-tools 1mo ago

b9357

vulkan: avoid preferring transfer queue on AMD UMA devices ( #22455 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

38
llama.cpp releases dev-tools 1mo ago

b9354

convert: add MiniCPM5 tokenizer support ( #23384 ) Add minicpm5 pre-tokenizer hash via convert_hf_to_gguf_update.py and implement hardcoded regex handling in llama-vocab.cpp, consistent with other BPE pre-tokenizers. Co-authored-by: zhangtao zhangtao2@modelbest.cn macOS/iOS:…

11

Page 7 of 10 · 469 articles ← Newer Older →

Product

Pricing
Roadmap
Changelog
Incidents

Resources

News RSS
MCP RSS
MCP releases RSS
Incidents RSS
Changelog RSS

Project

About
API
Contact

Legal

Privacy
Terms
Security

Prismix · © 2026 · AI Hub

All product names and logos are trademarks of their respective owners.

Send feedback

Name (optional)

Email *

Message *