Version Bump — AI news on Prismix

v0.30.1: llm: ignore llama-server SSE ping comments (#16443)

llama.cpp b9478 added a default 30s SSE ping that emits colon-only comment frames (":\n\n") while streamed requests are idle; Ollama treated non-data SSE lines as JSON, so skip SSE comments in completion and chat streams.

36

Ollama releases dev-tools 29d ago

v0.30.1-rc0

launch: isolate Codex launch configuration ( #16437 )

7

r/MachineLearning community 29d ago

Backpropagation destroys V1 brain alignment in one epoch, tracking RSA alignment to fMRI across training for BP, FA, predictive coding, and STDP [R]

Third in a series of papers tracking learning rules vs. human fMRI (THINGS dataset, V1–IT, N=3 subjects). Previous finding: untrained CNNs match backprop at V1. This paper asks: when does training break that, and does the learning rule matter? Setup: RSA alignment measured at 8…

30

OpenAI Python SDK releases dev-tools 1mo ago

v2.40.0

2.40.0 (2026-06-01) Full Changelog: v2.39.0...v2.40.0 Features api: Add Amazon Bedrock Responses support Bug Fixes api: allow setting bedrock api keys on the client directly ( 4d5bfde )

19

Ollama releases dev-tools 1mo ago

v0.30.0: launch: migrate Codex config (#16397)

launch: migrate Codex config

30

OpenAI Python SDK releases dev-tools 1mo ago

v2.39.0

2.39.0 (2026-06-01) Full Changelog: v2.38.0...v2.39.0 Features api: workload identity in audit logs, additional_tools item in responses, fix ActionSearch.query to be optional. ( ab60d7a )

10

ComfyUI releases dev-tools 1mo ago

v0.23.0

What's Changed feat: MediaPipe face detection (CORE-235) by @kijai in #14009 Multi-threaded load of models from disk (big load time speedups & Offload to disk) (CORE-43,CORE-152,CORE-164,CORE-165,CORE-117) by @rattus128 in #13802 Repo security stuff. by @comfyanonymous in #14019…

28

Ollama releases dev-tools 1mo ago

v0.30.0-rc32: llama-server followups (#16353)

llama-server followups Misc fixes for #16031 Add back dropped ROCm build flag for multi-GPU support on windows Fix amdhip64_*.dll version detection for "latest" selection Fix embeddings API for consistent normalize behavior with prior versions ci: set up for automated llama.cpp…

19

r/LocalLLaMA community 1mo ago

mistral.rs v0.8.2: up to 2.8x faster CUDA inference than llama.cpp on GB10, B200, and H100

Hey all! I’ve been working on CUDA performance in mistral.rs, and v0.8.2 is focused on CUDA throughput. The result: on Gemma 4 (dense & MoE), mistral.rs is faster than llama.cpp at every point in my release sweep on GB10/H100/B200. See some results below on GB10 and B200:…

24

r/LocalLLaMA community 1mo ago

Llama Studio v0.2.0

I have made an update to my llama-server WebUI based on some awesome feedback and interaction with the community. 1) JSON model config replaced by per-model shell scripts. Run from CLI, paste from unsloth, email to your buddy or post to reddit: Using real shell scripts to store…

17

Hacker News — AI on Front Page community 1mo ago

The AV2 Video Standard Has Released (Final v1.0 Specification)

Article URL: https://av2.aomedia.org Comments URL: https://news.ycombinator.com/item?id=48340910 Points: 203 # Comments: 80

34

r/LocalLLaMA community 1mo ago

this new Moss tts 1.5 is damn good with voice cloning

https://huggingface.co/spaces/OpenMOSS-Team/MOSS-TTS-v1.5 I prefer this over fish audio s2 pro because fish audio dont allow commercial use Long Cat DiT 3.5 is also a another good model.   submitted by   /u/9r4n4y [link]   [comments]

38

vLLM releases dev-tools 1mo ago

v0.22.1rc0: [CI] Make Model Executor test hangs fail fast with a traceback (#43971)

Signed-off-by: khluu khluu000@gmail.com Co-authored-by: Claude noreply@anthropic.com

10

llama.cpp releases dev-tools 1mo ago

b9411

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation ( #23346 ) llama : support DeepSeek V3.2 model family (with DSA lightning indexer) convert : handle DeepseekV32ForCausalLM architecture ggml : support for f16 GGML_OP_FILL…

34

Ollama releases dev-tools 1mo ago

v0.30.0-rc31

ci fix - non-shallow MLX checkout

29

Ollama releases dev-tools 1mo ago

v0.30.0-rc30

version bump

18

Anthropic SDK (Python) releases dev-tools 1mo ago

v0.105.2

0.105.2 (2026-05-29) Full Changelog: v0.105.1...v0.105.2

14

Anthropic SDK (Python) releases dev-tools 1mo ago

v0.105.1

0.105.1 (2026-05-29) Full Changelog: v0.105.0...v0.105.1 Chores internal: use Trusted Publishing for PyPI releases ( 1d04fc5 )

34

Ollama releases dev-tools 1mo ago

v0.30.0-rc29

review comments

24

Anthropic SDK (Python) releases dev-tools 1mo ago

v0.105.0

0.105.0 (2026-05-28) Full Changelog: v0.104.1...v0.105.0 Features api: Add support for claude-opus-4-8, mid-conversation system blocks, and usage.output_tokens_details ( f18b014 ) support custom file size caps ( #1825 ) ( 7e5f944 ) Chores examples: rename managed-agents…

12

r/LocalLLaMA community 1mo ago

Krasis update: Qwen3.6-35B-A3B (Q4) at reading speed, 1x 8GB 3070 Mobile laptop (32GB RAM)

Context Krasis is an LLM runtime for running models that don't fit into VRAM. Krasis streams the model through VRAM from system RAM efficiently and handles prefill and decode as separate architectures and optimised usecases. Latest results (v1.0 release) 1x Laptop RTX 3070…

22

vLLM releases dev-tools 1mo ago

v0.22.0rc3: [BugFix] Fix hard-coded timeout for multi-API-server startup (#43768)

Signed-off-by: Vadim Gimpelson vadim.gimpelson@gmail.com Co-authored-by: Nick Hill nickhill123@gmail.com

20

vLLM releases dev-tools 1mo ago

v0.22.0: [BugFix] Fix hard-coded timeout for multi-API-server startup (#43768)

Signed-off-by: Vadim Gimpelson vadim.gimpelson@gmail.com Co-authored-by: Nick Hill nickhill123@gmail.com

29

vLLM releases dev-tools 1mo ago

v0.22.0rc2: Fix early CUDA init (#43791)

Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com (cherry picked from commit 41688e2 )

11

Ollama releases dev-tools 1mo ago

v0.30.0-rc28

add OLLAMA_IGPU_ENABLE and largely disable iGPUs by default

14

ComfyUI releases dev-tools 1mo ago

v0.22.3

ComfyUI v0.22.3

36

r/MachineLearning community 1mo ago

Best Text to Text Translation Model? [D]

I'm working on a project that translates any language into English. So far, I've tried NMT models like NLLB, MADLAD, and SeamlessM4T v2. The main issue is that they struggle with proper nouns such as: - names - places - dates - organizations I also tried LLMs like Gemma 4, Qwen…

22

r/LocalLLaMA community 1mo ago

Info: Nvidia Cuda 13.3 landed

Cuda 13.3 Downloads Release Notes Anybody already tried llama.cpp with 13.3?   submitted by   /u/parrot42 [link]   [comments]

18

vLLM releases dev-tools 1mo ago

v0.22.0rc1: [MRV2][BugFix] Fix KV connector handling in spec decode case (#43719)

Signed-off-by: Nick Hill nickhill123@gmail.com Co-authored-by: Wentao Ye 44945378+yewentao256@users.noreply.github.com (cherry picked from commit 8c94938 )

18

Ollama releases dev-tools 1mo ago

v0.30.0-rc27

ci: windows path workaround for CPU build

20

Ollama releases dev-tools 1mo ago

v0.30.0-rc26: Merge remote-tracking branch 'upstream/main' into llama-runner-phase-0

Conflicts: server/images.go server/images_test.go

33

r/LocalLLaMA community 1mo ago

OpenMOSS-Team/MOSS-TTS-v1.5 · Hugging Face

MOSS-TTS-v1.5 MOSS-TTS-v1.5 is continued from MOSS-TTS 1.0 . It preserves the main 1.0 capabilities, including zero-shot voice cloning, long-form speech generation, token-level duration control, Pinyin/IPA pronunciation control, multilingual synthesis, and code-switching. For…

10

r/LocalLLaMA community 1mo ago

Harbor v0.4.19 - vllm/sglang/llama.cpp launch codex/claude/pi/opencode

I'm usually not posting about Harbor releases out of the respect for the community here, but I think v0.4.19 might save a lot of people some time. Harbor can now launch your local agentic coding tools with local inference backends. For example, to run pi + vllm: # model…

26

Ollama releases dev-tools 1mo ago

v0.30.0-rc25

ci: fix WoA cross-compile

13

r/LocalLLaMA community 1mo ago

MiMo-V2.5-coder

Hi, I've just released MiMo-V2.5-coder. If you have 128 Gb, this is an excellent alternative to Qwen3.6 and DS4, especially for coding. Fast, and with reliable tool calling. Give it a try!   submitted by   /u/jedisct1 [link]   [comments]

7

Ollama releases dev-tools 1mo ago

v0.30.0-rc24

version bump

20

r/MachineLearning community 1mo ago

LQS v3.1 — an open methodology for rating AI training data (multi-oracle consensus + signed certificates) [P]

Solo author here. I spent the last six months building (and then sunsetting) a marketplace for AI training data. The marketplace failed for an interesting reason: the actual bottleneck isn't supply. There's tons of data. The bottleneck is that buyers can't independently evaluate…

14

r/LocalLLaMA community 1mo ago

BeeLlama v0.2.0 – major DFlash update. Single RTX 3090: Qwen 3.6 27B up to 164 tps (4.40x), Gemma 4 31B up to 177.8 tps (4.93x). Prompt processing speed near baseline.

BeeLlama v0.2.0 is here! Not quite a pegasus, but close enough. GitHub | Qwen 3.6 27B Quick Start | Gemma 4 31B Quick Start Full Gemma 4 31B support with efficient DFlash implementation and vision. Major Qwen 3.6 27B performance update from lower DFlash overhead, cleaner prefill…

28

ComfyUI releases dev-tools 1mo ago

v0.22.2

ComfyUI v0.22.2

6

r/LocalLLaMA community 1mo ago

trained a prompt injection detector using ml-intern and DeepSeek v4 Flash, runs in the browser

Trained a prompt injection classifier using ml-intern + DeepSeek v4 Flash. DistilBERT, F1 99%, ONNX int8, ~65 MB, runs in browser with Transformers.js v3. You can try it here: https://huggingface.co/spaces/av-codes/prompt-injection-detector --- I've been interested in prompt…

5

Ollama releases dev-tools 1mo ago

v0.30.0-rc23

lint fix

8

Anthropic SDK (Python) releases dev-tools 1mo ago

v0.104.1

0.104.1 (2026-05-21) Full Changelog: v0.104.0...v0.104.1 Bug Fixes streaming: carry encrypted_content through beta compaction accumulator ( #1821 ) ( f7a720c )

29

Hacker News — AI on Front Page community 1mo ago

Deno 2.8

Article URL: https://deno.com/blog/v2.8 Comments URL: https://news.ycombinator.com/item?id=48234380 Points: 215 # Comments: 98

27

ComfyUI releases dev-tools 1mo ago

v0.22.1

ComfyUI v0.22.1

18

OpenAI Python SDK releases dev-tools 1mo ago

v2.38.0

2.38.0 (2026-05-21) Full Changelog: v2.37.0...v2.38.0 Features api: api update ( 33d1d01 ) api: manual updates ( a21700a ) api: update OpenAPI spec or Stainless config ( 00265c5 ) Chores api: docs updates ( ee10152 ) check release PR custom code sync ( 2638779 ) remove release…

26

Anthropic SDK (Python) releases dev-tools 1mo ago

v0.104.0

0.104.0 (2026-05-21) Full Changelog: v0.103.1...v0.104.0 Features api: Add support for thinking-token-count beta for estimated tokens in thinking block deltas when streaming ( 80d0fdf )

7

Ollama releases dev-tools 1mo ago

v0.30.0-rc22

version bump

5

r/LocalLLaMA community 1mo ago

LlamaStation v0.9 — llama.cpp GUI for Windows with multi-backend support, TurboQuant, MTP and more

I've been building this for the past few months as a side project — started because I didn't want to run llama.cpp from the command line every time I wanted to try a model. I just wanted something that worked with a click. Fair warning: I'm not a developer. This is 100% vibe…

33

ComfyUI releases dev-tools 1mo ago

v0.22.0

ComfyUI v0.22.0

30

llama.cpp releases dev-tools 1mo ago

b9246: snapdragon: update toolchain to v0.6 (#23369)

snapdragon: update compiler flags to enable all CPU features snapdragon: update readme to point to toolchain v0.6 snapdragon: bump toolchain docker to v0.6

37

v0.30.1: llm: ignore llama-server SSE ping comments (#16443)

v0.30.1-rc0

Backpropagation destroys V1 brain alignment in one epoch, tracking RSA alignment to fMRI across training for BP, FA, predictive coding, and STDP [R]

v2.40.0

v0.30.0: launch: migrate Codex config (#16397)

v2.39.0

v0.23.0

v0.30.0-rc32: llama-server followups (#16353)

mistral.rs v0.8.2: up to 2.8x faster CUDA inference than llama.cpp on GB10, B200, and H100

Llama Studio v0.2.0

The AV2 Video Standard Has Released (Final v1.0 Specification)

this new Moss tts 1.5 is damn good with voice cloning

v0.22.1rc0: [CI] Make Model Executor test hangs fail fast with a traceback (#43971)

b9411

v0.30.0-rc31

v0.30.0-rc30

v0.105.2

v0.105.1

v0.30.0-rc29

v0.105.0

Krasis update: Qwen3.6-35B-A3B (Q4) at reading speed, 1x 8GB 3070 Mobile laptop (32GB RAM)

v0.22.0rc3: [BugFix] Fix hard-coded timeout for multi-API-server startup (#43768)

v0.22.0: [BugFix] Fix hard-coded timeout for multi-API-server startup (#43768)

v0.22.0rc2: Fix early CUDA init (#43791)

v0.30.0-rc28

v0.22.3

Best Text to Text Translation Model? [D]

Info: Nvidia Cuda 13.3 landed

v0.22.0rc1: [MRV2][BugFix] Fix KV connector handling in spec decode case (#43719)

v0.30.0-rc27

v0.30.0-rc26: Merge remote-tracking branch &#39;upstream/main&#39; into llama-runner-phase-0

OpenMOSS-Team/MOSS-TTS-v1.5 · Hugging Face

Harbor v0.4.19 - vllm/sglang/llama.cpp launch codex/claude/pi/opencode

v0.30.0-rc25

MiMo-V2.5-coder

v0.30.0-rc24

LQS v3.1 — an open methodology for rating AI training data (multi-oracle consensus + signed certificates) [P]

BeeLlama v0.2.0 – major DFlash update. Single RTX 3090: Qwen 3.6 27B up to 164 tps (4.40x), Gemma 4 31B up to 177.8 tps (4.93x). Prompt processing speed near baseline.

v0.22.2

trained a prompt injection detector using ml-intern and DeepSeek v4 Flash, runs in the browser

v0.30.0-rc23

v0.104.1

Deno 2.8

v0.22.1

v2.38.0

v0.104.0

v0.30.0-rc22

LlamaStation v0.9 — llama.cpp GUI for Windows with multi-backend support, TurboQuant, MTP and more

v0.22.0

b9246: snapdragon: update toolchain to v0.6 (#23369)

v0.30.0-rc26: Merge remote-tracking branch 'upstream/main' into llama-runner-phase-0