News / #model-release Tag Model releases 500 articles archived under #model-release · RSS Sign in to follow r/LocalLLaMA community 1d ago Claude Code Is Steganographically Marking Requests   submitted by   /u/johnnyApplePRNG [link]   [comments] 21 r/LocalLLaMA community 1d ago DeepSeek-V4-Flash (MXFP4): compute buffer scales ~3x just from KV cache quant type (f16 vs q8_0) — anyone else seeing this? Llama.cpp Bartowski's DeepSeek-V4-Flash-MXFP4 GGUF, llama.cpp build 9851 ( 0eca4d490 ), deepseek4 arch. Ran the same n_ctx = 10240 , same n_ubatch = n_batch = 8192 , flash attention on — only difference is -ctk / -ctv : Cache type Total KV cache (CUDA0) CUDA0 compute buffer f16 (default,… 18 Vercel — AI dev-tools 1d ago Resend joins the Vercel Marketplace Resend is now available on the Vercel Marketplace , allowing Vercel teams to send emails from their applications without managing any infrastructure. This integration helps developers start sending messages within minutes, build emails as React components, and track delivery… 5 Simon Willison community 1d ago Quoting Anthropic We’ve received notice that the Department of Commerce has lifted export controls on Claude Fable 5 and Mythos 5. We'll begin restoring access tomorrow, and will share an update soon. — Anthropic , on Twitter Tags: anthropic , claude , generative-ai , claude-mythos , ai ,… 34 Hacker News — AI on Front Page community 1d ago Department of Commerce has lifted export controls on Claude Fable 5 and Mythos 5 Article URL: https://twitter.com/AnthropicAI/status/2072106151890809341 Comments URL: https://news.ycombinator.com/item?id=48740771 Points: 232 # Comments: 90 6 Hugging Face Daily Papers research 1d ago MirrorPPR: Exemplar-Based Portrait Photo Retouching Abstract Exemplar-based portrait retouching framework using Diffusion Transformer with LoRA adaptation and self-augmented training data achieves superior quality and identity preservation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While text-guided image editing has made… 24 Simon Willison community 1d ago Nano Banana 2 Lite Nano Banana 2 Lite Also known as Gemini 3.1 Flash Lite Image ( gemini-3.1-flash-lite-image in their API ), this is the "fastest and cheapest Gemini image model, engineered for velocity and scale". I used AI studio to run this prompt: Do a where's Waldo style image but it's where… 30 MIT Technology Review — AI news-outlet 1d ago Claude Science is Anthropic’s newest flagship product At an event for pharmaceutical executives, biotech founders, and researchers on Tuesday, Anthropic announced Claude Science, a major new product intended to support scientific research in the same way that Claude Code supports software engineering. Like Claude Code, Claude… 25 Simon Willison community 1d ago What's new in Claude Sonnet 5 What's new in Claude Sonnet 5 Claude Sonnet 5 came out this morning . I always head straight for the "what's new" developer docs because they tend to have more actionable information than the official announcement post. Anthropic say of Sonnet 5 that "its performance is… 26 Hacker News — AI on Front Page community 1d ago I ported Kubernetes to the browser https://github.com/ngrok/webernetes https://webernetes-demo.ngrok.app/ Comments URL: https://news.ycombinator.com/item?id=48738985 Points: 263 # Comments: 80 21 Hugging Face Daily Papers research 1d ago Delayed Verification Destabilizes Multi-Agent LLM Belief: Instability Thresholds and Optimal Corrector Placement Abstract Delayed verification in multi-agent LLM systems can cause instability leading to oscillations, but grounded factual answering stabilizes the system by making truth an absorbing boundary. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multi-agent large language model (LLM)… 14 r/LocalLLaMA community 1d ago Agents-A1 GGUF quants (35B Qwen3.5-MoE agent model) — NVFP4 for Blackwell + working MTP speculative decoding (up to 1.22× single-user, 91% draft acceptance) Repo → huggingface.co/LordNeel/Agents-A1-GGUF I made GGUF quants of InternScience/Agents-A1 — a 35B Mixture of Experts agent model (Qwen3.5-MoE, ~3B active, 256 experts / 8+1 active, hybrid linear+full attention, 256K context). It's built for long-horizon search, tool-calling,… 27 r/LocalLLaMA community 1d ago Devs - you have 64gb of VRAM - which model do you use for coding? I've currently settled on an unsloth version of Qwen 3.5 122b-a10b model (UD-IQ4_NL). With 100k bf16 context window, I only had to load a few layers into CPU/RAM, it runs around 30 tok/sec which is fine for me. I've tested many models, hours of testing but I am currently deeply… 32 r/LocalLLaMA community 1d ago Dual RTX 6000, for Deepseek v4 Flash??? My last post got a lot of interaction asking 6000 pro owners if they regretted, the answer was hard NO. I ended up understanding that dual rtx 6000 pro run deepseek v4 flash extremely fast. I went to the near stores and got offers around $50-60k for dual rtx 6000 pro ai server.… 36 Hugging Face Daily Papers research 1d ago LLM Program Optimization via Retrieval Augmented Search Abstract Blackbox adaptation methods using retrieval-augmented search and atomic edit decomposition improve program optimization performance for both C++ and Python code. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent work has demonstrated the potential of large language… 19 r/LocalLLaMA community 1d ago The harness matters more than the model. A 27B behind good critics changed my mind. I saw someone test Qwen3.6-27B with a 3-critic harness. The harness included code review, test review and Playwright e2e. Each critic had context. The result was that the model is usable for coding work. This matches what I have come to believe from running agents in production.… 20 TechCrunch — AI news-outlet 1d ago Anthropic launches Claude Sonnet 5 as a cheaper way to run agents Anthropic’s Claude Sonnet 5 brings stronger agentic capabilities, lower pricing, and improved safety, positioning the model as a cheaper alternative to Opus, GPT-5.5, and Gemini Pro. 5 Hacker News — AI on Front Page community 1d ago Claude Sonnet 5 Article URL: https://www.anthropic.com/news/claude-sonnet-5 Comments URL: https://news.ycombinator.com/item?id=48736605 Points: 300 # Comments: 145 8 Anthropic SDK (Python) releases dev-tools 1d ago v0.114.0 0.114.0 (2026-06-30) Full Changelog: v0.113.0...v0.114.0 Features api: add support for claude-sonnet-5 ( b893033 ) Bug Fixes agent_toolset: allow absolute paths that resolve inside workdir ( #121 ) ( 0105529 ) 9 Hugging Face Daily Papers research 1d ago Mind the Heads: Topological Representation Alignment for Multimodal LLMs Abstract HeRA aligns individual attention heads in MLLMs to preserve local neighborhood relationships across modalities, improving vision-centric task performance and reducing visual hallucinations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Representation alignment has… 27 r/LocalLLaMA community 1d ago HIP: use hipBLAS for dense prefill on gfx900, keep MMQ for MoE by DEV-DUFORD · Pull Request #24588 · ggml-org/llama.cpp Overall Performance Gains: Qwen3.5 4B : +36.1% Qwen3.6 27B : +18.9% Gemma4 12B : +65.1% Overall average : ~40% Only for gfx900 related GPUs: Vega GPU, codename vega10, including Radeon Vega Frontier Edition, Radeon RX Vega 56/64, Radeon RX Vega 64 Liquid, Radeon Pro Vega… 5 Hacker News — AI on Front Page community 1d ago Claude Science Article URL: https://claude.com/product/claude-science Comments URL: https://news.ycombinator.com/item?id=48735770 Points: 371 # Comments: 122 28 TechCrunch — AI news-outlet 1d ago Anthropic’s Claude Science bets on workflow, not a new model, to win over scientists Anthropic's Claude Science is a workbench that gives scientists one environment to do computational research, saving them from the need to bounce between databases, pipelines, and tools. 18 llama.cpp releases dev-tools 1d ago b9850 model : register t_layer_inp for qwen3next ( #25141 ) Fix input assignment in layer processing loop Fix DFLASH for qwen-coder-next add line break Added tensor for attention normalization in Qwen3 model. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI… 37 Simon Willison community 1d ago Have your agent record video demos of its work with shot-scraper video shot-scraper video is a new command introduced in today's shot-scraper 1.10 release which accepts a storyboard.yml file defining a routine to run against a web application and uses Playwright to record a video of that routine. I've written before about the importance of having… 31 Hugging Face Daily Papers research 1d ago SWE-Together: Evaluating Coding Agents in Interactive User Sessions Abstract SWE-Together is a multi-turn coding benchmark created from real user-agent interactions, featuring a reactive LLM simulator to evaluate agents based on both final correctness and interaction efficiency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Most coding-agent… 32 Hugging Face Daily Papers research 1d ago DreamForge-World 0.1 Preview: A Low-Compute Real-Time Controllable World Model Abstract DreamForge-World 0.1 Preview adapts a video generation architecture with a residual action pathway to enable real-time interactive world simulation on consumer hardware with low computational requirements. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present… 18 Google DeepMind official-blog 1d ago Start building with Nano Banana 2 Lite and Gemini Omni Flash Start building with Nano Banana 2 Lite and Gemini Omni Flash Jun 30, 2026 · Share x.com Facebook LinkedIn Mail We’re making it easier to experiment and scale your ideas with Nano Banana 2 Lite, our fastest, most cost-efficient Gemini Image model, and Gemini Omni Flash for… 20 Hacker News — AI on Front Page community 1d ago Claude Code Is Steganographically Marking Requests Article URL: https://thereallo.dev/blog/claude-code-prompt-steganography Comments URL: https://news.ycombinator.com/item?id=48734373 Points: 303 # Comments: 97 23 Hugging Face Daily Papers research 1d ago RocketSmith: Agentic Additive Manufacturing of High-Powered Rockets Abstract An agentic system using large language models automates high-power rocket design processes, enabling successful flight testing with consistent simulation results. Generated by Qwen/Qwen2.5-Coder-32B-Instruct RocketSmith is an agentic system which intelligently automates… 9 Hugging Face Daily Papers research 1d ago A Gravitational Interpretation of Fine-Tuning Reversion Abstract Post-alignment safety degradation arises from geometric properties of training history, where fine-tuning reversion follows a persistent direction defined by early training dynamics. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Fine-tuning on harmless data can partially… 35 Hugging Face Daily Papers research 1d ago One Model, Many Latencies: Universal Speech Enhancement for Diverse Real-Time Applications Abstract A universal speech enhancement model with configurable algorithmic and computational latency controls using parallel convolutions and early-exit mechanisms. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Different real-time speech applications impose distinct latency… 9 Simon Willison community 1d ago shot-scraper 1.10 Release: shot-scraper 1.10 The big new feature is shot-scraper video storyboard.yml , described in detail in Have your agent record video demos of its work with shot-scraper video . Tags: shot-scraper 11 TechCrunch — AI news-outlet 1d ago X now offers an MCP server to make its platform easier for AI tools to use X has launched a hosted MCP server, making it easier for developers to connect AI applications with the company’s API. 24 TechCrunch — AI news-outlet 1d ago Amazon launches new $1 billion FDE org, following OpenAI and Anthropic Engineers on the new team will embed within companies to deploy purpose-built agents, focusing on fast deployments and customer self-sufficiency. 28 Hugging Face Daily Papers research 1d ago SAM2Matting: Generalized Image and Video Matting Abstract SAM2Matting advances video matting by decoupling tracking and matting tasks through a tracker-to-matting framework that leverages foundational trackers with region-proposal bridges and dedicated matting heads. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Despite… 36 r/LocalLLaMA community 1d ago PageStorm: A Model Built for Creative Book Writing Over a year ago, we set out to build a single-turn full-book writing model. Half a year ago, we published our LongPage Dataset for book scale creative writing. Today, we are announcing our first model: PageStorm Research Preview. Paper: https://arxiv.org/abs/2605.17064 Models:… 9 Hugging Face Daily Papers research 1d ago One-Step Gradient Delay is Not a Barrier for Large-Scale Asynchronous Pipeline Parallel LLM Pretraining Abstract Asynchronous pipeline parallelism with PipeDream-2BW can achieve near-synchronous performance through optimizer selection and error feedback correction, overcoming traditional stability concerns. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Modern large-scale LLM… 29 r/LocalLLaMA community 1d ago I benchmarked full tool catalog vs ranked catalog on a local model: 8% → 77% accuracy Been running agents locally for a while and kept hitting the same issue: the more tools I added, the worse the model got at picking the right one.. So I finally benchmarked it properly.. Setup: qwen3.5-class model on an M4 MacBook, 100 tools in the catalog. One run with the full… 23 r/LocalLLaMA community 1d ago HydraHead: From Head-Level Functional Heterogeneity to Specialized Attention Hybridization (from the Qwen team) The quadratic complexity of attention poses a critical bottleneck for long-context processing, spurring interest in hybrid attention designs. Most open-source hybrid models adopt a layer-wise strategy. Yet, prior work has noted the inherent difficulty of integrating Linear… 9 r/LocalLLaMA community 1d ago Qwen 3.6 27B Speculative Decoding Bench: Pushing ~100 TPS on a single RTX 3090 First of all, a huge thank you to the r/LocalLLaMA community and the 3090 club. This benchmark started from your shared recipes... These are my findings on my hardware (Xeon E5-2666v3, 64GB RAM, single RTX 3090 24GB) comparing 5 engines (3 llama.cpp forks + mainline + Lucebox)… 12 Hugging Face Daily Papers research 1d ago RaysUp: Ultra-light Universal Feature Upsampling via Geometry-Aware Ray Representation Abstract RaysUp is a lightweight, task-agnostic feature upsampling framework that reconstructs high-resolution features using geometry-aware ray domain techniques with improved efficiency and accuracy. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Pre-trained Vision Foundation… 37 r/LocalLLaMA community 1d ago Huawei open-sources OpenPangu-2.0-Flash - 92B total,6B active https://x.com/Chinazhidx/status/2071877413685109071 TODAY: #Huawei open-sources OpenPangu-2.0-Flash #OpenPangu 2.0 includes two 512K-context models: • Flash: 92B total,6B active—Weights+inference code+training ops released • Pro: 505B total,18B active—flagship model, coming in… 38 Hugging Face Daily Papers research 1d ago Illuminating Unified Multimodal Model for Free-form Interleaved Text-Image Generation Abstract ILLUME-X is a unified multimodal paradigm that enhances text-image generation through improved data efficiency, stable training processes, and comprehensive evaluation metrics. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The advancement of generative AI models capable… 17 r/LocalLLaMA community 1d ago Bartowski has delivered DS4 GGUF Looking forward to compare with Antirez's DS4 imamtrix https://huggingface.co/bartowski/DeepSeek-V4-Flash-GGUF   submitted by   /u/challis88ocarina [link]   [comments] 31 r/LocalLLaMA community 1d ago MTP-only GGUF subsets: Qwen3.5/3.6 They are just MTP-only GGUF subsets of Qwen3.5/3.6 Medium/Large (27B and above) models (to accelerate token generation of Qwen-based models without MTP tensors ). But I hope they help experimenting with various Qwen3.5/3.6-based fine-tunes. The reason I originally created some… 30 r/LocalLLaMA community 1d ago nvidia/Qwen3.6-27B-NVFP4 just dropped https://huggingface.co/nvidia/Qwen3.6-27B-NVFP4   submitted by   /u/vanbukin [link]   [comments] 37 Hugging Face Daily Papers research 1d ago Beyond IID: How General Are Tabular Foundation Models, Really? Abstract Tabular foundation models show varying performance across different data conditions, with traditional methods still outperforming newer approaches on complex, large-scale datasets. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Foundation models for predictive machine… 15 r/LocalLLaMA community 1d ago Norm-preserving abliteration on Qwen3.6-35B-A3B: 0% refusal, benchmarks intact, open source dataset Been reading the mechanistic interpretability literature on refusal for a while now. The core insight from Arditi et al. (2024) is clean: refusal is mediated by a geometrically consistent direction in the residual stream. You can find it via the difference of means between… 4 Hugging Face Daily Papers research 1d ago Agentic Abstention: Do Agents Know When to Stop Instead of Act? Abstract Agentic abstention involves determining when an AI agent should cease interaction under uncertainty, requiring sequential decision-making across multiple environments and task types. Generated by Qwen/Qwen2.5-Coder-32B-Instruct LLM agents are expected to act over… 13 Page 2 of 10 · 500 articles ← Newer Older →