Home Status News MCP Pricing Sign in

Home Status News MCP Pricing Sign in

News / llama.cpp releases

llama.cpp releases

469 articles archived · Visit source ↗ · RSS

Sign in to subscribe

llama.cpp releases dev-tools 1mo ago

b9257

vulkan: optimize operations in the IM2COL shader ( #22685 ) vulkan: optimize operations in the IM2COL shader Add comments and improve the code formatting macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux:…

23
llama.cpp releases dev-tools 1mo ago

b9255

hexagon: HMX quantized matmul rework ( #23368 ) hmx-mm: update debug logging in hmx-mm hmx-mm: update dequant logic to use HVX_vector_x2/4 hmx-mm: remove non-pipelined version of the quantize matmul It seems that we don't reall need non-pipelined version hmx-mm: use activation…

36
llama.cpp releases dev-tools 1mo ago

b9254

Programmatic Dependent Launch (PDL) for more performance on newer NVIDIA GPUs (Hopper+) ( #22522 ) Adds initial PDL setup. Adds PDL barriers based on simple heuristic: place "sync" before first input pointer access, and "launch" after last write, e.g. to tensors like dst.…

17
llama.cpp releases dev-tools 1mo ago

b9253

app : introduce the llama unified executable ( #23296 ) app : introduce the llama unified executable Signed-off-by: Adrien Gallouët angt@huggingface.co Use serve for server Signed-off-by: Adrien Gallouët angt@huggingface.co Hide completion and bench, add help command…

26
llama.cpp releases dev-tools 1mo ago

b9251

mtmd: fit_params now take into account mmproj ( #21489 ) mtmd: fit_params now take into account mmproj rename alloc_compute_meta to reserve_compute_meta rm unused functions add ggml_backend_dev_t support add debug log macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…

23
llama.cpp releases dev-tools 1mo ago

b9247

metal : optimize pad + cpy ( #23354 ) metal : optimize pad metal : optinmize cpy cont : better row packing in threadgroup macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64…

29
llama.cpp releases dev-tools 1mo ago

b9246: snapdragon: update toolchain to v0.6 (#23369)

snapdragon: update compiler flags to enable all CPU features snapdragon: update readme to point to toolchain v0.6 snapdragon: bump toolchain docker to v0.6

37
llama.cpp releases dev-tools 1mo ago

b9245

ggml-cuda: tune RDNA3 Q6_K MMVQ nwarps ( #23349 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…

35
llama.cpp releases dev-tools 1mo ago

b9244

opencl: add MoE support for q4_k, q5_k, q6_k on Adreno ( #23303 ) opencl: add q4_k moe support opencl: add q5_k moe support opencl: add q6_k moe support opencl: adjust format Co-authored-by: Li He lih@qti.qualcomm.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…

33
llama.cpp releases dev-tools 1mo ago

b9243

hexagon: add MROPE and IMROPE support in HTP rope op ( #23317 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

18
llama.cpp releases dev-tools 1mo ago

b9235

llama : MTP clean-up ( #23269 ) llama : disable equal splits for recurrent memory with partial rollback spec : re-enable p-min with MTP drafts spec : re-enable ngram spec in combination with RS rollback spec : fix ngram-map-* params spec : fix acceptance logic in combined ngram…

27
llama.cpp releases dev-tools 1mo ago

b9240

common: fix --help for --verbosity ( #23278 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…

5
llama.cpp releases dev-tools 1mo ago

b9239

common: fix --fit verbosity with --verbosity 4 ( #23282 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

35
llama.cpp releases dev-tools 1mo ago

b9222

hexagon: add support for TRI op ( #22822 ) Hexagon: TRI HVX Kernel addition to ggml hexagon HTP ops and context addressed PR review comments for TRI op hexagon: clang format hex-unary: remove merge conflict markers hex-ggml: remove duplicate op cases (merge conflict) hex-ggml:…

36
llama.cpp releases dev-tools 1mo ago

b9221

ggml-hexagon: add PAD op HVX kernel ( #23078 ) ggml-hexagon: add PAD op HVX kernel Implements GGML_OP_PAD on the Hexagon HTP backend using HVX vectorized kernels. Supports zero-padding and circular padding across all 4 tensor dimensions. hex-ggml: remove duplicate op cases…

26
llama.cpp releases dev-tools 1mo ago

b9219

common : remove hf cache migration ( #23266 ) Signed-off-by: Adrien Gallouët angt@huggingface.co macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…

20
llama.cpp releases dev-tools 1mo ago

b9216

ui: Refactor models store, MCP service, and gate logs behind VITE_DEBUG ( #23236 ) refactor: Scope console logs to DEV + VITE_DEBUG env vars refactor: skip MCP proxy probe when no server requires it refactor: suppress expected disconnect errors during MCP client shutdown…

33
llama.cpp releases dev-tools 1mo ago

b9213

llama: initialize pre-norm embedding mask flag ( #23256 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

21
llama.cpp releases dev-tools 1mo ago

b9208

sycl: route small f32 matmuls to oneMKL, bypass oneDNN ( #22150 ) Signed-off-by: Chun Tao chun.tao@intel.com Co-authored-by: Chun Tao chun.tao@intel.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux:…

18
llama.cpp releases dev-tools 1mo ago

b9209: sycl: scalar SWAR byte-subtract in Q6_K MMVQ dot product (#22156)

Signed-off-by: Chun Tao chun.tao@intel.com Co-authored-by: Chun Tao chun.tao@intel.com

11
llama.cpp releases dev-tools 1mo ago

b9204

feat: Support d_conv=15 for ssm-conv.cu ( #23017 ) Branch: ModalityConditionalAdapters AI-usage: none Signed-off-by: Gabe Goodhart ghart@us.ibm.com macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux:…

21
llama.cpp releases dev-tools 1mo ago

b9203

cmake : fix LLAMA_BUILD_UI logic ( #23190 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…

4
llama.cpp releases dev-tools 1mo ago

b9202

cmake : do not install conversion script ( #23204 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

5
llama.cpp releases dev-tools 1mo ago

b9200

llama: avoid copying logits during prompt decode in MTP ( #23198 ) llama: avoid copying logits during prompt decode in MTP review: update comment llama-graph: call set_output for t_h_pre_norm macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…

10
llama.cpp releases dev-tools 1mo ago

b9198

ggml-vulkan/CMakeLists: add a check for SPIRV-Headers ( #22009 ) ci/run: set explicit SPIR-V Headers search path for macOS vulkan CI For whatever reason, the files are under additional sub-path vulkan/ under the cmake directory, which does not match either current LunarG macOS…

8
llama.cpp releases dev-tools 1mo ago

b9197

vulkan: add cpy bf16 -> f32 pipelines ( #22677 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…

23
llama.cpp releases dev-tools 1mo ago

b9196

vulkan: Support unaligned tensors for ROPE ( #22637 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

36
llama.cpp releases dev-tools 1mo ago

b9194

vulkan: fuse SSM_CONV + BIAS + SILU ( #22653 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…

34
llama.cpp releases dev-tools 1mo ago

b9193

server : honor --embd-normalize CLI arg ( #23125 ) The --embd-normalize flag was registered only for the embedding and debug examples, so llama-server rejected it and the /embedding handler used a hard-coded default of 2 (L2). Add LLAMA_EXAMPLE_SERVER to the flag's example set…

7
llama.cpp releases dev-tools 1mo ago

b9192

ngram : reduce noisy logs ( #23185 ) ngram : reduce noisy logs ngram : reduce noisy logs macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…

19
llama.cpp releases dev-tools 1mo ago

b9191

webui: support video files as input ( #22830 )

33
llama.cpp releases dev-tools 1mo ago

b9190

server: (router) alloc tmp buffer on heap ( #23159 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

16
llama.cpp releases dev-tools 1mo ago

b9189

server: skip device enumeration in router mode to avoid creating CUDA primary context ( #23137 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…

7
llama.cpp releases dev-tools 1mo ago

b9186

sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO)…

15
llama.cpp releases dev-tools 1mo ago

b9181

vendor : update cpp-httplib to 0.45.0 ( #23103 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…

12
llama.cpp releases dev-tools 1mo ago

b9180

llama + spec: MTP Support ( #22673 ) spec: support MTP fix batch size rename files cont : simplify ( #7 ) MTP: clean-up ( #9 ) MTP: clean-up review: use llama_context_type instead of llama_graph_type review: remove llama_model_has_mtp review: fix convert issues convert: fix…

37
llama.cpp releases dev-tools 1mo ago

b9174

ui: Restructure repo to use tools/ui folder and ui / UI / llama-ui / LLAMA_UI naming ( #23064 ) webui: Move static build output from tools/server/public to build/ui directory refactor: Move to tools/ui refactor: rename CMake variables and preprocessor defines Rename…

36
llama.cpp releases dev-tools 1mo ago

b9173

ci : fix release symlinks ( #23119 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm…

33
llama.cpp releases dev-tools 1mo ago

b9172

webui: Use lowercase hash for HF checksum check ( #23107 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

24
llama.cpp releases dev-tools 1mo ago

b9169

mtmd: add chunks and fix preproc for qwen3a ( #23073 ) mtmd: add chunks and fix preproc for qwen3a add attn_mask limit mtmd_chunk size (avoid blow up memory) correct audio tokens re-order the set_input case remove attn_mask macOS/iOS: macOS Apple Silicon (arm64) macOS Apple…

7
llama.cpp releases dev-tools 1mo ago

b9165

ci : fix transform of top . entry in release archive ( #23080 ) fix transform of top . entry in release archive simplify macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64…

11
llama.cpp releases dev-tools 1mo ago

b9163

reasoning-budget: clone should do a deep-copy ( #23095 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

36
llama.cpp releases dev-tools 1mo ago

b9161

Support for Codex CLI by skipping unsupported Responses tools ( #23041 ) Support for Codex CLI by skipping unsupported Responses tools Warn on skipped Responses tools and preserve gpt-oss apply_patch rejection Revert gpt-oss apply_patch special handling macOS/iOS: macOS Apple…

29
llama.cpp releases dev-tools 1mo ago

b9159

ggml-hexagon: cpy: add contiguous fast-path in reshape copy ( #23076 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

4
llama.cpp releases dev-tools 1mo ago

b9158

HIP: RDNA3 mma FA, faster AMD transpose, tune AMD ( #22880 ) Adds RDNA3 support to the CUDA mma FA kernel. To make the RDNA3 tensor cores work with the FP16 accumulation for VKQ the tiles they need to be 32 logical units long in direction of the attention head; for head sizes 80…

25
llama.cpp releases dev-tools 1mo ago

b9156

ggml-webgpu: Enable NVIDIA self-hosted CI ( #22976 ) Enabel nvidia ci for webgpu Address precision issues fix placement Relax more set_rows and div Try relaxing all f16 formatting and naming Add comment explaining max_nmse_err logic Added comment referencing pull request for…

21
llama.cpp releases dev-tools 1mo ago

b9151

logs : reduce ( #23021 ) logs : reduce args : fix envs server : fix build common : print verbosity level at start server : clean-up logs server : print prompt processing timings + sampling params minor : whitespaces macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…

8
llama.cpp releases dev-tools 1mo ago

b9150

ggml-cpu: Add IME2 Instruction Support for the SpacemiT Backend ( #22863 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

20
llama.cpp releases dev-tools 1mo ago

b9148

unicode,test: add Qwen3.5 non-backtracking tokenizer handler and regr… ( #22110 ) unicode,test: add Qwen3.5 non-backtracking tokenizer handler and regression tests Add unicode_regex_split_custom_qwen35() to src/unicode.cpp , a non-backtracking handler for Qwen3.5's [\p{L}\p{M}]+…

18
llama.cpp releases dev-tools 1mo ago

b9145

SYCL: fix multi-GPU system RAM exhaustion by using Level Zero allocations ( #21597 ) SYCL: fix multi-GPU system RAM exhaustion by using Level Zero allocations Replace sycl::malloc_device with zeMemAllocDevice for GPU memory allocation in the SYCL backend. sycl::malloc_device…

6

Page 9 of 10 · 469 articles ← Newer Older →

Product

Pricing
Roadmap
Changelog
Incidents

Resources

News RSS
MCP RSS
MCP releases RSS
Incidents RSS
Changelog RSS

Project

About
API
Contact

Legal

Privacy
Terms
Security

Prismix · © 2026 · AI Hub

All product names and logos are trademarks of their respective owners.

Send feedback

Name (optional)

Email *

Message *