Tag

Model releases

500 articles archived under #model-release · RSS

TechCrunch — AI news-outlet 2d ago

Cursor now has a mobile app for guiding your coding agent on the go

Cursor has launched a new mobile app for remote oversight over coding agents.

29
r/MachineLearning community 2d ago

I'm trying to implement CALM paper, and I have some questions. [P]

Hello, I'm trying to implement the Pocket TTS by kyutai-labs represented by this paper . Since they have didn't released the training/fine-tuning code. I'm trying to implement it on my own for learning some stuff. I have read the paper, tried to implement it with much more…

34
Simon Willison community 2d ago

Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding

Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding This is an interesting new open weights (MIT licensed) model, the first model release from DeepReinforce. [...] with variants including 9B Dense, 31B Dense, 35B MoE, and 397B MoE. Built on top of pretrained Gemma 4 and Qwen…

5
r/MachineLearning community 2d ago

Adaptive Mixture of Experts Gate (AMG) [R]

[Project] Post-hoc Adaptive MoE Gating on Qwen3.6-35B — empirical benchmarking of an open research gap Adaptive MoE routing — selecting a variable number of experts per token based on routing confidence — has been studied in papers (XMoE 2024, DynMoE ICLR 2025, TopP routing…

5
r/LocalLLaMA community 2d ago

Going from single GPU to dual GPU is nice but not in the way I expected

I was expecting what when doubling my VRAM from 24gb to 2x24gb I'd use higher quants with more context, and thus get smarter LLMs, but that's not what it ended up happening. At least for coding, I found that the difference in quality from, say, qwen 27B UD-Q4-XL to a Q6 or Q8 is…

21
r/LocalLLaMA community 2d ago

Instead of decentralized training effort we should build the “One dataset”

There are many threads here calling for united LLM training run of a new open model. Mainly, after govt. stunt of banning commercial frontier models. And also due to the lack of small-medium open-weight models releases lately. I genuinelly believe at some point we’ll have “SETI…

38
Hacker News — AI on Front Page community 2d ago

Rocketlab acquires Iridium

Article URL: https://investors.rocketlabcorp.com/news-releases/news-release-details/rocket-lab-acquire-iridium-historic-deal-creating-fully Comments URL: https://news.ycombinator.com/item?id=48719485 Points: 222 # Comments: 136

25
r/LocalLLaMA community 3d ago

Deepseek V4 Official Launch to be released mid-July with API price changes

Is this the official release for deepseek? I hope it has huge improvements https://preview.redd.it/dm5l0qn8k7ah1.png?width=694&format=png&auto=webp&s=12eadfd0a52c0f1a65bcd685f2cdbb29aff457be   submitted by   /u/jmorant555 [link]   [comments]

22
llama.cpp releases dev-tools 3d ago

b9840

DeepSeek V4 ( #24162 ) convert: add dsv4 conversion add basic setup add llm_graph_input_dsv4 add save-load state add sinkhorn eps - correction by @fairydreaming add rope fix cleanup dead code fix bugs support pro model: added by @fairydreaming remove redundant V cache Chat…

26
r/LocalLLaMA community 3d ago

DeepSeek V4 official version will be launch on mid-July

https://preview.redd.it/n7rwh262b7ah1.jpg?width=1024&format=pjpg&auto=webp&s=33d775b456843cd2dbd458de89384a6a7d6d87d1 Source: Email sent from deepseek (email only available for chinese user) used gpt image 2 translate image into english   submitted by  …

34
r/LocalLLaMA community 3d ago

DeepSeek V4 by am17an · Pull Request #24162 · ggml-org/llama.cpp

now you can run DeepSeek V4 locally   submitted by   /u/jacek2023 [link]   [comments]

26
r/LocalLLaMA community 3d ago

GLM 5.2 Q1_S vs Qwen 27B Q8

TL;DR; GLM-5.2 Q1_S beats Qwen 3.6 27B Q8, both run at KV Q8 edit: GLM run a K & V Q8, Qwen run with KV cache at full FP16., with preserve thinking on. Disclaimer : This is a hobby/amateur comparison with n=1, so go easy on it. I just thought it would be fun to share. The…

11
r/LocalLLaMA community 3d ago

MiCA is now part of Hugging Face PEFT

Glad to share that MiCA, short for Minor Component Adaptation, has now been merged into the HuggingFace PEFT library. It is not yet included in the latest PyPI release, but you can already install it directly from PEFT main: pip install --upgrade…

18
Vercel — AI dev-tools 3d ago

Build realtime voice agents on AI Gateway

AI Gateway now supports audio/voice. You can add realtime voice, text to speech, and speech to text with the same calls you already use for text, image, and video, routed through AI Gateway alongside every other modality. Audio launches with models from OpenAI and xAI . Each…

26
Smol AI News news-outlet 3d ago

not much happened today

**Meta** announced **Brain2Qwerty v2**, a real-time non-invasive brain-to-text decoder achieving up to **78% word accuracy** with released training code and dataset. **Cursor** launched **Cursor for iOS** with remote AI agents and live activity features. Open-weight model access…

35
arXiv — Machine Learning research 3d ago

Unified Zero-Shot Time Series Forecasting: A Darts Foundation

arXiv:2606.27438v1 Announce Type: new Abstract: Since its initial release in 2020, Darts has become a widely used open-source Python library for time series analysis. A series of foundation models have recently claimed accuracy improvements in zero-shot forecasting, promising a…

15
arXiv — Machine Learning research 3d ago

FoggyTrust: Robust Federated Learning with Hierarchical Trust Networks

arXiv:2606.27622v1 Announce Type: new Abstract: Byzantine-robust federated learning seeks to protect distributed model training from malicious or corrupted clients without requiring access to their private data. FLTrust addresses this challenge by introducing a trusted…

33
arXiv — Machine Learning research 3d ago

Recovering Sharp Conductivity Features in the Finite-Data Calder\'on Problem with Physics-Informed Neural Networks

arXiv:2606.28158v1 Announce Type: new Abstract: Physics-informed neural networks (PINNs) have recently emerged as a promising framework for addressing the Calder\'on inverse problem from limited boundary data. In this work, we revisit neural Calder\'on inversion by introducing…

31
arXiv — Machine Learning research 3d ago

Qwen-Image-2.0-RL Technical Report

arXiv:2606.27608v1 Announce Type: cross Abstract: We present Qwen-Image-2.0-RL, a post-training pipeline that applies reinforcement learning from human feedback (RLHF) and on-policy distillation (OPD) to improve both the visual quality and instruction-following capability of the…

34
r/LocalLLaMA community 3d ago

I built an agent Harness for Small Models. I got Qwen 3.5 4b managing servers.

This is something I've been working on, I like playing around with smaller local models but found most agent harness's not well suited for them. The failure modes across different model family's tend to be the same: Failed tool calls Poor varication of environment variables Poor…

12
Vercel — AI dev-tools 3d ago

xAI Grok audio models now available on Vercel AI Gateway

xAI's audio models are now live on AI Gateway. Realtime voice, text to speech, and speech to text are all available through the AI SDK with the same routing, observability, and spend controls as your other models. These capabilities are available on the AI SDK 7 release.…

11
r/LocalLLaMA community 3d ago

Qwen3.6-27B UD Q3 with kv at q8 is quite amazing for simple proof of concepts

Preface, technology is not my industry, but I am a very passionate poor man. So much so that I discovered 'AI' - ChatGPT in the beginning of 2025. So go easy on me, I only try. I kind of understand MOE vs. Dense models, MOEs are much forgiving when it comes to running as there…

22
r/LocalLLaMA community 3d ago

Tensor split performance on low-bandwidth (TB3) eGPUs, and a question

Hey everyone! I've got a pair of Morefine G1 4090M 16gb eGPUs connected at 40Gbps via TB3 (daisy-chained). I normally run them in layer split mode as it doesn't seem to need much bandwidth; I'm seeing around 1300t/s PP and 26t/s TG (35-40 with MTP), qwen3.6-27B @ Q4. Which is…

20
TechCrunch — AI news-outlet 3d ago

Ford rehires ‘gray beard’ engineers after AI falls short

"Mistakenly we thought that by just introducing artificial intelligence ... that would produce a high-quality product.”

19
Hacker News — AI on Front Page community 3d ago

GLM 5.2 beats Claude in our benchmarks

Article URL: https://semgrep.dev/blog/2026/we-have-mythos-at-home-glm-52-beats-claude-in-our-cyber-benchmarks/ Comments URL: https://news.ycombinator.com/item?id=48709670 Points: 273 # Comments: 109

22
OpenAI official-blog 3d ago

HP Inc. launches Frontier strategic partnership with OpenAI

HP Inc. scales its OpenAI Frontier partnership to deploy AI across customer experiences, software development, and enterprise operations.

18
Hacker News — AI on Front Page community 3d ago

I used Claude Code to get a second opinion on my MRI

Article URL: https://antoine.fi/mri-analysis-using-claude-code-opus Comments URL: https://news.ycombinator.com/item?id=48708941 Points: 206 # Comments: 309

38
r/LocalLLaMA community 3d ago

Script to monitor llama cpp and analyze memory usage

My goal has always been to be productive with commodity hardware. So far my workhorses have been the MoE editions of gemma 4 and Qwen 3.6 on an old desktop with a single 9060XT with 16GB ram. The problem has always been that every source is vague about Vram/ram requirements.…

33
Don't Worry About the Vase community 3d ago

GPT-5.6: The System Card

While we wait for a general release, the system card is the best hint as to what is going on with the new candidate for America’s Next Top Model, GPT-5.6.

5
Hacker News — AI on Front Page community 3d ago

EU to legislate about Chat Control behind closed doors

Article URL: https://www.patrick-breyer.de/en/double-threat-to-private-communications-undemocratic-chat-control-backroom-deals-and-imminent-concessions-spark-relaunch-of-fightchatcontrol-eu/ Comments URL: https://news.ycombinator.com/item?id=48707719 Points: 249 # Comments: 132

11
r/LocalLLaMA community 3d ago

DeepSpec - a deepseek-ai Collection

DeepSpec DeepSpec is a full-stack codebase for training and evaluating draft models for speculative decoding. It contains data preparation utilities, draft model implementations, training code, and evaluation scripts. Released Checkpoints The checkpoints below are the ones used…

26
r/LocalLLaMA community 4d ago

Qwen3.6 27B local vs Opus 4.8, voxel engine in raw C with zero frameworks

Sunday experiment. Same prompt to both. Build a voxel world in plain C. No engine, no game library, no framework, just the compiler. The model does its own chunk meshing, render loop and memory management by hand. Left is Claude Code on Opus 4.8. Right is Qwen3.6 27B local on…

37
r/LocalLLaMA community 4d ago

How many of you do use Q1 or Q2 of Big models(100-250B)? How's it?

Sharing popular(also recent) models for reference: 151-250B : DeepSeek-V4-Flash Step-3.X-Flash Command-a-plus-05-2026 Laguna-M.1 MiniMax-M2.X Qwen3-235B-A22B 100-150B : GLM-4.5-Air Qwen3.5-122B-A10B NVIDIA-Nemotron-3-Super-120B-A12B Mistral-Small-4-119B-2603…

34
llama.cpp releases dev-tools 4d ago

b9830

common : allow --offline in llama download ( #25091 ) Expose the existing --offline flag to llama download so a script can run it to check whether a model is already cached and ready to be served without touching the network. Also fix a latent use-after-free in the URL-task…

4
r/LocalLLaMA community 4d ago

A barebones CPU-only inference engine for Qwen 3, written from scratch in pure C

TL;DR: The (very messy) code and writeups can be found at https://github.com/jakint0sh/qwen3-engine Read the README for instructions on how to get started. And for those who just want a bulleted list: - Inference engine for Qwen 3 sizes 4B and below - Written from scratch in…

37
r/LocalLLaMA community 4d ago

Is Qwen3-VL-2B the only viable VLM for JSON extraction on a "potato"?

After spending countless hours testing on 3 "potato" laptops (Intel i3, 8GB RAM, Win11, integrated GPU), that's my conclusion. For reliably extracting data from images to JSON on low-end hardware, nothing else even comes close. Yet, it’s completely missing from major benchmarks…

23
r/LocalLLaMA community 4d ago

10x Kaioken SSJ1 4th grade, worth it in 2026? Can it run Qwen3.6?

  submitted by   /u/Ice94k [link]   [comments]

20
r/LocalLLaMA community 4d ago

Koboldcpp v1.116 released

  submitted by   /u/Fcking_Chuck [link]   [comments]

19
r/LocalLLaMA community 4d ago

I had 55 LLMs blind-grade each other (22k judgments, all open). Every model family with enough data is biased toward its own siblings. Qwen judges favor Qwen by ~0.9 points. Mistral penalizes its own by ~1.0.

I have been running an open evaluation setup where N models answer the same prompt, then blind-grade each other in an N x N matrix with self-judgments excluded. No single privileged judge. So far: 286 evaluations, 198 hand-written questions, 22,254 valid judgments across 55…

35
r/LocalLLaMA community 4d ago

2x RX 9060xt 16gb, is it worth it?

I'm planning to buy 2x RX 9060xt with 16gb each to run Qwen 3.6 27B and alike. Would it be a good investment? How much tk/s should i expect in generation and prefill? I'm planning to use this as a coding agent in a large codebase. Currently I'm running this on my i7 64gb laptop…

35
Hugging Face Daily Papers research 4d ago

COrigami: An AI Pipeline for Co-Designing Flat-Foldable Visually Recognisable Origami

Abstract A computational origami system generates crease patterns from natural language using AI-driven optimization and aesthetic evaluation, enabling human-AI collaboration in mathematically constrained design. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While generative AI…

11
r/LocalLLaMA community 4d ago

I built a tool to turn your Claude Code sessions into fine-tuning data for local models

If you use Claude Code, every session is already sitting on disk as a .jsonl file under ~/.claude/projects/ . It has real coding conversations: multi-turn edits, tool calls, reasoning traces. That's training data you already generated for free. The problem is the format is not…

36
r/LocalLLaMA community 4d ago

Full document redaction with Qwen 3.6 27B with a Pi agent harness

Link to full blog post with all method details, results, and links to all relevant code/skills/prompts at the bottom of this post. Apologies for not having more links throughout, it seems this subreddit restricts too many links in posts. Document redaction tasks are complex…

10
The Algorithmic Bridge news-outlet 4d ago

The AI Industry as You Know It Died Today

OpenAI announced GPT-5.6 but a terrible thing has happened

30
Hugging Face Daily Papers research 4d ago

Fast LeWorldModel

Abstract Fast-LeWM accelerates visual planning by replacing autoregressive rollout with parallel action-prefix prediction, reducing computational costs and latency accumulation during long-horizon predictions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Joint-Embedding…

20
r/LocalLLaMA community 4d ago

Ornith 35B is great so far

Tried creating a quick 3d game with it, after 3 prompts, it got me this(checkvideo). If I compare this with qwen3.5-35b-a3b, it was not able to successfully generate this and was failing even after multiple prompts. Harness: Claude Code How is your experience so far ?…

4
r/LocalLLaMA community 4d ago

Mythos was the first, now GPT-5.6

https://techcrunch.com/2026/06/26/openai-limits-gpt-5-6-rollout-after-government-request-says-restrictions-shouldnt-be-the-norm/ Either a hype before IPO, or they have just shot themselves in a foot. This is pretty much it for more advanced online models. Local LLM is one of the…

17
r/LocalLLaMA community 5d ago

[NEW MODEL] - SupraSafety-18M · Tiny Content-Moderation Model

Hey r/LocalLLaMA ! SupraLabs is back with a new model: SupraSafety-18M . It's a BERT-style 18M params model trained from scratch on 2 T4 GPUs in Kaggle on the nvidia/Nemotron-3.5-Content-Safety-Dataset dataset for 7 epochs. It's built to run on edge devices , mobile phones , or…

13
TechCrunch — AI news-outlet 5d ago

Asian AI startups launch Mythos-like models as Anthropic’s export ban drags on

New models are launching in Asia that promise Mythos-like capabilities without fear of an export ban. U.S. AI labs may never recover this enormous market.

29
r/LocalLLaMA community 5d ago

We built a calibration-aware Q4_K_M quant of Qwen3.5 0.8B that recovers 96.5% of the BF16 gap vs pure llama.cpp Q4_K_M (SpectralQuant)

Hey everyone, We just released our first release candidate from Spectral Labs: a Qwen3.5 0.8B Q4_K_M built using a new calibration-aware quantization approach we're calling SpectralQuant . The goal here was to see if we could make a standard Q4_K_M footprint behave more like a…

15

Cursor now has a mobile app for guiding your coding agent on the go

I'm trying to implement CALM paper, and I have some questions. [P]

Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding

Adaptive Mixture of Experts Gate (AMG) [R]

Going from single GPU to dual GPU is nice but not in the way I expected

Instead of decentralized training effort we should build the “One dataset”

Rocketlab acquires Iridium

Deepseek V4 Official Launch to be released mid-July with API price changes

b9840

DeepSeek V4 official version will be launch on mid-July

DeepSeek V4 by am17an · Pull Request #24162 · ggml-org/llama.cpp

GLM 5.2 Q1_S vs Qwen 27B Q8

MiCA is now part of Hugging Face PEFT

Build realtime voice agents on AI Gateway

not much happened today

Unified Zero-Shot Time Series Forecasting: A Darts Foundation

FoggyTrust: Robust Federated Learning with Hierarchical Trust Networks

Recovering Sharp Conductivity Features in the Finite-Data Calder\'on Problem with Physics-Informed Neural Networks

Qwen-Image-2.0-RL Technical Report

I built an agent Harness for Small Models. I got Qwen 3.5 4b managing servers.

xAI Grok audio models now available on Vercel AI Gateway

Qwen3.6-27B UD Q3 with kv at q8 is quite amazing for simple proof of concepts

Tensor split performance on low-bandwidth (TB3) eGPUs, and a question

Ford rehires ‘gray beard’ engineers after AI falls short

GLM 5.2 beats Claude in our benchmarks

HP Inc. launches Frontier strategic partnership with OpenAI

I used Claude Code to get a second opinion on my MRI

Script to monitor llama cpp and analyze memory usage

GPT-5.6: The System Card

EU to legislate about Chat Control behind closed doors

DeepSpec - a deepseek-ai Collection

Qwen3.6 27B local vs Opus 4.8, voxel engine in raw C with zero frameworks

How many of you do use Q1 or Q2 of Big models(100-250B)? How's it?

b9830

A barebones CPU-only inference engine for Qwen 3, written from scratch in pure C

Is Qwen3-VL-2B the only viable VLM for JSON extraction on a "potato"?

10x Kaioken SSJ1 4th grade, worth it in 2026? Can it run Qwen3.6?

Koboldcpp v1.116 released

I had 55 LLMs blind-grade each other (22k judgments, all open). Every model family with enough data is biased toward its own siblings. Qwen judges favor Qwen by ~0.9 points. Mistral penalizes its own by ~1.0.

2x RX 9060xt 16gb, is it worth it?

COrigami: An AI Pipeline for Co-Designing Flat-Foldable Visually Recognisable Origami

I built a tool to turn your Claude Code sessions into fine-tuning data for local models

Full document redaction with Qwen 3.6 27B with a Pi agent harness

The AI Industry as You Know It Died Today

Fast LeWorldModel

Ornith 35B is great so far

Mythos was the first, now GPT-5.6

[NEW MODEL] - SupraSafety-18M · Tiny Content-Moderation Model

Asian AI startups launch Mythos-like models as Anthropic&#8217;s export ban drags on

We built a calibration-aware Q4_K_M quant of Qwen3.5 0.8B that recovers 96.5% of the BF16 gap vs pure llama.cpp Q4_K_M (SpectralQuant)

Asian AI startups launch Mythos-like models as Anthropic’s export ban drags on