Tag

Model releases

500 articles archived under #model-release · RSS

r/LocalLLaMA community 1d ago

Claude Code Is Steganographically Marking Requests

  submitted by   /u/johnnyApplePRNG [link]   [comments]

21
r/LocalLLaMA community 1d ago

DeepSeek-V4-Flash (MXFP4): compute buffer scales ~3x just from KV cache quant type (f16 vs q8_0) — anyone else seeing this? Llama.cpp

Bartowski's DeepSeek-V4-Flash-MXFP4 GGUF, llama.cpp build 9851 ( 0eca4d490 ), deepseek4 arch. Ran the same n_ctx = 10240 , same n_ubatch = n_batch = 8192 , flash attention on — only difference is -ctk / -ctv : Cache type Total KV cache (CUDA0) CUDA0 compute buffer f16 (default,…

18
Vercel — AI dev-tools 1d ago

Resend joins the Vercel Marketplace

Resend is now available on the Vercel Marketplace , allowing Vercel teams to send emails from their applications without managing any infrastructure. This integration helps developers start sending messages within minutes, build emails as React components, and track delivery…

5
Simon Willison community 1d ago

Quoting Anthropic

We’ve received notice that the Department of Commerce has lifted export controls on Claude Fable 5 and Mythos 5. We'll begin restoring access tomorrow, and will share an update soon. — Anthropic , on Twitter Tags: anthropic , claude , generative-ai , claude-mythos , ai ,…

34
Hacker News — AI on Front Page community 1d ago

Department of Commerce has lifted export controls on Claude Fable 5 and Mythos 5

Article URL: https://twitter.com/AnthropicAI/status/2072106151890809341 Comments URL: https://news.ycombinator.com/item?id=48740771 Points: 232 # Comments: 90

6
Hugging Face Daily Papers research 1d ago

MirrorPPR: Exemplar-Based Portrait Photo Retouching

Abstract Exemplar-based portrait retouching framework using Diffusion Transformer with LoRA adaptation and self-augmented training data achieves superior quality and identity preservation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While text-guided image editing has made…

24
Simon Willison community 1d ago

Nano Banana 2 Lite

Nano Banana 2 Lite Also known as Gemini 3.1 Flash Lite Image ( gemini-3.1-flash-lite-image in their API ), this is the "fastest and cheapest Gemini image model, engineered for velocity and scale". I used AI studio to run this prompt: Do a where's Waldo style image but it's where…

30
MIT Technology Review — AI news-outlet 1d ago

Claude Science is Anthropic’s newest flagship product

At an event for pharmaceutical executives, biotech founders, and researchers on Tuesday, Anthropic announced Claude Science, a major new product intended to support scientific research in the same way that Claude Code supports software engineering. Like Claude Code, Claude…

25
Simon Willison community 1d ago

What's new in Claude Sonnet 5

What's new in Claude Sonnet 5 Claude Sonnet 5 came out this morning . I always head straight for the "what's new" developer docs because they tend to have more actionable information than the official announcement post. Anthropic say of Sonnet 5 that "its performance is…

26
Hacker News — AI on Front Page community 1d ago

I ported Kubernetes to the browser

https://github.com/ngrok/webernetes https://webernetes-demo.ngrok.app/ Comments URL: https://news.ycombinator.com/item?id=48738985 Points: 263 # Comments: 80

21
Hugging Face Daily Papers research 1d ago

Delayed Verification Destabilizes Multi-Agent LLM Belief: Instability Thresholds and Optimal Corrector Placement

Abstract Delayed verification in multi-agent LLM systems can cause instability leading to oscillations, but grounded factual answering stabilizes the system by making truth an absorbing boundary. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multi-agent large language model (LLM)…

14
r/LocalLLaMA community 1d ago

Agents-A1 GGUF quants (35B Qwen3.5-MoE agent model) — NVFP4 for Blackwell + working MTP speculative decoding (up to 1.22× single-user, 91% draft acceptance)

Repo → huggingface.co/LordNeel/Agents-A1-GGUF I made GGUF quants of InternScience/Agents-A1 — a 35B Mixture of Experts agent model (Qwen3.5-MoE, ~3B active, 256 experts / 8+1 active, hybrid linear+full attention, 256K context). It's built for long-horizon search, tool-calling,…

27
r/LocalLLaMA community 1d ago

Devs - you have 64gb of VRAM - which model do you use for coding?

I've currently settled on an unsloth version of Qwen 3.5 122b-a10b model (UD-IQ4_NL). With 100k bf16 context window, I only had to load a few layers into CPU/RAM, it runs around 30 tok/sec which is fine for me. I've tested many models, hours of testing but I am currently deeply…

32
r/LocalLLaMA community 1d ago

Dual RTX 6000, for Deepseek v4 Flash???

My last post got a lot of interaction asking 6000 pro owners if they regretted, the answer was hard NO. I ended up understanding that dual rtx 6000 pro run deepseek v4 flash extremely fast. I went to the near stores and got offers around $50-60k for dual rtx 6000 pro ai server.…

36
Hugging Face Daily Papers research 1d ago

LLM Program Optimization via Retrieval Augmented Search

Abstract Blackbox adaptation methods using retrieval-augmented search and atomic edit decomposition improve program optimization performance for both C++ and Python code. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent work has demonstrated the potential of large language…

19
r/LocalLLaMA community 1d ago

The harness matters more than the model. A 27B behind good critics changed my mind.

I saw someone test Qwen3.6-27B with a 3-critic harness. The harness included code review, test review and Playwright e2e. Each critic had context. The result was that the model is usable for coding work. This matches what I have come to believe from running agents in production.…

20
TechCrunch — AI news-outlet 1d ago

Anthropic launches Claude Sonnet 5 as a cheaper way to run agents

Anthropic’s Claude Sonnet 5 brings stronger agentic capabilities, lower pricing, and improved safety, positioning the model as a cheaper alternative to Opus, GPT-5.5, and Gemini Pro.

5
Hacker News — AI on Front Page community 1d ago

Claude Sonnet 5

Article URL: https://www.anthropic.com/news/claude-sonnet-5 Comments URL: https://news.ycombinator.com/item?id=48736605 Points: 300 # Comments: 145

8
Anthropic SDK (Python) releases dev-tools 1d ago

v0.114.0

0.114.0 (2026-06-30) Full Changelog: v0.113.0...v0.114.0 Features api: add support for claude-sonnet-5 ( b893033 ) Bug Fixes agent_toolset: allow absolute paths that resolve inside workdir ( #121 ) ( 0105529 )

9
Hugging Face Daily Papers research 1d ago

Mind the Heads: Topological Representation Alignment for Multimodal LLMs

Abstract HeRA aligns individual attention heads in MLLMs to preserve local neighborhood relationships across modalities, improving vision-centric task performance and reducing visual hallucinations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Representation alignment has…

27
r/LocalLLaMA community 1d ago

HIP: use hipBLAS for dense prefill on gfx900, keep MMQ for MoE by DEV-DUFORD · Pull Request #24588 · ggml-org/llama.cpp

Overall Performance Gains: Qwen3.5 4B : +36.1% Qwen3.6 27B : +18.9% Gemma4 12B : +65.1% Overall average : ~40% Only for gfx900 related GPUs: Vega GPU, codename vega10, including Radeon Vega Frontier Edition, Radeon RX Vega 56/64, Radeon RX Vega 64 Liquid, Radeon Pro Vega…

5
Hacker News — AI on Front Page community 1d ago

Claude Science

Article URL: https://claude.com/product/claude-science Comments URL: https://news.ycombinator.com/item?id=48735770 Points: 371 # Comments: 122

28
TechCrunch — AI news-outlet 1d ago

Anthropic’s Claude Science bets on workflow, not a new model, to win over scientists

Anthropic's Claude Science is a workbench that gives scientists one environment to do computational research, saving them from the need to bounce between databases, pipelines, and tools.

18
llama.cpp releases dev-tools 1d ago

b9850

model : register t_layer_inp for qwen3next ( #25141 ) Fix input assignment in layer processing loop Fix DFLASH for qwen-coder-next add line break Added tensor for attention normalization in Qwen3 model. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…

37
Simon Willison community 1d ago

Have your agent record video demos of its work with shot-scraper video

shot-scraper video is a new command introduced in today's shot-scraper 1.10 release which accepts a storyboard.yml file defining a routine to run against a web application and uses Playwright to record a video of that routine. I've written before about the importance of having…

31
Hugging Face Daily Papers research 1d ago

SWE-Together: Evaluating Coding Agents in Interactive User Sessions

Abstract SWE-Together is a multi-turn coding benchmark created from real user-agent interactions, featuring a reactive LLM simulator to evaluate agents based on both final correctness and interaction efficiency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Most coding-agent…

32
Hugging Face Daily Papers research 1d ago

DreamForge-World 0.1 Preview: A Low-Compute Real-Time Controllable World Model

Abstract DreamForge-World 0.1 Preview adapts a video generation architecture with a residual action pathway to enable real-time interactive world simulation on consumer hardware with low computational requirements. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present…

18
Google DeepMind official-blog 1d ago

Start building with Nano Banana 2 Lite and Gemini Omni Flash

Start building with Nano Banana 2 Lite and Gemini Omni Flash Jun 30, 2026 · Share x.com Facebook LinkedIn Mail We’re making it easier to experiment and scale your ideas with Nano Banana 2 Lite, our fastest, most cost-efficient Gemini Image model, and Gemini Omni Flash for…

20
Hacker News — AI on Front Page community 1d ago

Claude Code Is Steganographically Marking Requests

Article URL: https://thereallo.dev/blog/claude-code-prompt-steganography Comments URL: https://news.ycombinator.com/item?id=48734373 Points: 303 # Comments: 97

23
Hugging Face Daily Papers research 1d ago

RocketSmith: Agentic Additive Manufacturing of High-Powered Rockets

Abstract An agentic system using large language models automates high-power rocket design processes, enabling successful flight testing with consistent simulation results. Generated by Qwen/Qwen2.5-Coder-32B-Instruct RocketSmith is an agentic system which intelligently automates…

9
Hugging Face Daily Papers research 1d ago

A Gravitational Interpretation of Fine-Tuning Reversion

Abstract Post-alignment safety degradation arises from geometric properties of training history, where fine-tuning reversion follows a persistent direction defined by early training dynamics. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Fine-tuning on harmless data can partially…

35
Hugging Face Daily Papers research 1d ago

One Model, Many Latencies: Universal Speech Enhancement for Diverse Real-Time Applications

Abstract A universal speech enhancement model with configurable algorithmic and computational latency controls using parallel convolutions and early-exit mechanisms. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Different real-time speech applications impose distinct latency…

9
Simon Willison community 1d ago

shot-scraper 1.10

Release: shot-scraper 1.10 The big new feature is shot-scraper video storyboard.yml , described in detail in Have your agent record video demos of its work with shot-scraper video . Tags: shot-scraper

11
TechCrunch — AI news-outlet 1d ago

X now offers an MCP server to make its platform easier for AI tools to use

X has launched a hosted MCP server, making it easier for developers to connect AI applications with the company’s API.

24
TechCrunch — AI news-outlet 1d ago

Amazon launches new $1 billion FDE org, following OpenAI and Anthropic

Engineers on the new team will embed within companies to deploy purpose-built agents, focusing on fast deployments and customer self-sufficiency.

28
Hugging Face Daily Papers research 1d ago

SAM2Matting: Generalized Image and Video Matting

Abstract SAM2Matting advances video matting by decoupling tracking and matting tasks through a tracker-to-matting framework that leverages foundational trackers with region-proposal bridges and dedicated matting heads. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Despite…

36
r/LocalLLaMA community 1d ago

PageStorm: A Model Built for Creative Book Writing

Over a year ago, we set out to build a single-turn full-book writing model. Half a year ago, we published our LongPage Dataset for book scale creative writing. Today, we are announcing our first model: PageStorm Research Preview. Paper: https://arxiv.org/abs/2605.17064 Models:…

9
Hugging Face Daily Papers research 1d ago

One-Step Gradient Delay is Not a Barrier for Large-Scale Asynchronous Pipeline Parallel LLM Pretraining

Abstract Asynchronous pipeline parallelism with PipeDream-2BW can achieve near-synchronous performance through optimizer selection and error feedback correction, overcoming traditional stability concerns. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Modern large-scale LLM…

29
r/LocalLLaMA community 1d ago

I benchmarked full tool catalog vs ranked catalog on a local model: 8% → 77% accuracy

Been running agents locally for a while and kept hitting the same issue: the more tools I added, the worse the model got at picking the right one.. So I finally benchmarked it properly.. Setup: qwen3.5-class model on an M4 MacBook, 100 tools in the catalog. One run with the full…

23
r/LocalLLaMA community 1d ago

HydraHead: From Head-Level Functional Heterogeneity to Specialized Attention Hybridization (from the Qwen team)

The quadratic complexity of attention poses a critical bottleneck for long-context processing, spurring interest in hybrid attention designs. Most open-source hybrid models adopt a layer-wise strategy. Yet, prior work has noted the inherent difficulty of integrating Linear…

9
r/LocalLLaMA community 1d ago

Qwen 3.6 27B Speculative Decoding Bench: Pushing ~100 TPS on a single RTX 3090

First of all, a huge thank you to the r/LocalLLaMA community and the 3090 club. This benchmark started from your shared recipes... These are my findings on my hardware (Xeon E5-2666v3, 64GB RAM, single RTX 3090 24GB) comparing 5 engines (3 llama.cpp forks + mainline + Lucebox)…

12
Hugging Face Daily Papers research 1d ago

RaysUp: Ultra-light Universal Feature Upsampling via Geometry-Aware Ray Representation

Abstract RaysUp is a lightweight, task-agnostic feature upsampling framework that reconstructs high-resolution features using geometry-aware ray domain techniques with improved efficiency and accuracy. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Pre-trained Vision Foundation…

37
r/LocalLLaMA community 1d ago

Huawei open-sources OpenPangu-2.0-Flash - 92B total,6B active

https://x.com/Chinazhidx/status/2071877413685109071 TODAY: #Huawei open-sources OpenPangu-2.0-Flash #OpenPangu 2.0 includes two 512K-context models: • Flash: 92B total,6B active—Weights+inference code+training ops released • Pro: 505B total,18B active—flagship model, coming in…

38
Hugging Face Daily Papers research 1d ago

Illuminating Unified Multimodal Model for Free-form Interleaved Text-Image Generation

Abstract ILLUME-X is a unified multimodal paradigm that enhances text-image generation through improved data efficiency, stable training processes, and comprehensive evaluation metrics. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The advancement of generative AI models capable…

17
r/LocalLLaMA community 1d ago

Bartowski has delivered DS4 GGUF

Looking forward to compare with Antirez's DS4 imamtrix https://huggingface.co/bartowski/DeepSeek-V4-Flash-GGUF   submitted by   /u/challis88ocarina [link]   [comments]

31
r/LocalLLaMA community 1d ago

MTP-only GGUF subsets: Qwen3.5/3.6

They are just MTP-only GGUF subsets of Qwen3.5/3.6 Medium/Large (27B and above) models (to accelerate token generation of Qwen-based models without MTP tensors ). But I hope they help experimenting with various Qwen3.5/3.6-based fine-tunes. The reason I originally created some…

30
r/LocalLLaMA community 1d ago

nvidia/Qwen3.6-27B-NVFP4 just dropped

https://huggingface.co/nvidia/Qwen3.6-27B-NVFP4   submitted by   /u/vanbukin [link]   [comments]

37
Hugging Face Daily Papers research 1d ago

Beyond IID: How General Are Tabular Foundation Models, Really?

Abstract Tabular foundation models show varying performance across different data conditions, with traditional methods still outperforming newer approaches on complex, large-scale datasets. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Foundation models for predictive machine…

15
r/LocalLLaMA community 1d ago

Norm-preserving abliteration on Qwen3.6-35B-A3B: 0% refusal, benchmarks intact, open source dataset

Been reading the mechanistic interpretability literature on refusal for a while now. The core insight from Arditi et al. (2024) is clean: refusal is mediated by a geometrically consistent direction in the residual stream. You can find it via the difference of means between…

4
Hugging Face Daily Papers research 1d ago

Agentic Abstention: Do Agents Know When to Stop Instead of Act?

Abstract Agentic abstention involves determining when an AI agent should cease interaction under uncertainty, requiring sequential decision-making across multiple environments and task types. Generated by Qwen/Qwen2.5-Coder-32B-Instruct LLM agents are expected to act over…

13

Claude Code Is Steganographically Marking Requests

DeepSeek-V4-Flash (MXFP4): compute buffer scales ~3x just from KV cache quant type (f16 vs q8_0) — anyone else seeing this? Llama.cpp

Resend joins the Vercel Marketplace

Quoting Anthropic

Department of Commerce has lifted export controls on Claude Fable 5 and Mythos 5

MirrorPPR: Exemplar-Based Portrait Photo Retouching

Nano Banana 2 Lite

Claude Science is Anthropic’s newest flagship product

What's new in Claude Sonnet 5

I ported Kubernetes to the browser

Delayed Verification Destabilizes Multi-Agent LLM Belief: Instability Thresholds and Optimal Corrector Placement

Agents-A1 GGUF quants (35B Qwen3.5-MoE agent model) — NVFP4 for Blackwell + working MTP speculative decoding (up to 1.22× single-user, 91% draft acceptance)

Devs - you have 64gb of VRAM - which model do you use for coding?

Dual RTX 6000, for Deepseek v4 Flash???

LLM Program Optimization via Retrieval Augmented Search

The harness matters more than the model. A 27B behind good critics changed my mind.

Anthropic launches Claude Sonnet 5 as a cheaper way to run agents

Claude Sonnet 5

v0.114.0

Mind the Heads: Topological Representation Alignment for Multimodal LLMs

HIP: use hipBLAS for dense prefill on gfx900, keep MMQ for MoE by DEV-DUFORD · Pull Request #24588 · ggml-org/llama.cpp

Claude Science

Anthropic’s Claude Science bets on workflow, not a new model, to win over scientists

b9850

Have your agent record video demos of its work with shot-scraper video

SWE-Together: Evaluating Coding Agents in Interactive User Sessions

DreamForge-World 0.1 Preview: A Low-Compute Real-Time Controllable World Model

Start building with Nano Banana 2 Lite and Gemini Omni Flash

Claude Code Is Steganographically Marking Requests

RocketSmith: Agentic Additive Manufacturing of High-Powered Rockets

A Gravitational Interpretation of Fine-Tuning Reversion

One Model, Many Latencies: Universal Speech Enhancement for Diverse Real-Time Applications

shot-scraper 1.10

X now offers an MCP server to make its platform easier for AI tools to use

Amazon launches new $1 billion FDE org, following OpenAI and Anthropic

SAM2Matting: Generalized Image and Video Matting

PageStorm: A Model Built for Creative Book Writing

One-Step Gradient Delay is Not a Barrier for Large-Scale Asynchronous Pipeline Parallel LLM Pretraining

I benchmarked full tool catalog vs ranked catalog on a local model: 8% → 77% accuracy

HydraHead: From Head-Level Functional Heterogeneity to Specialized Attention Hybridization (from the Qwen team)

Qwen 3.6 27B Speculative Decoding Bench: Pushing ~100 TPS on a single RTX 3090

RaysUp: Ultra-light Universal Feature Upsampling via Geometry-Aware Ray Representation

Huawei open-sources OpenPangu-2.0-Flash - 92B total,6B active

Illuminating Unified Multimodal Model for Free-form Interleaved Text-Image Generation

Bartowski has delivered DS4 GGUF

MTP-only GGUF subsets: Qwen3.5/3.6

nvidia/Qwen3.6-27B-NVFP4 just dropped

Beyond IID: How General Are Tabular Foundation Models, Really?

Norm-preserving abliteration on Qwen3.6-35B-A3B: 0% refusal, benchmarks intact, open source dataset

Agentic Abstention: Do Agents Know When to Stop Instead of Act?