Tag

Model releases

500 articles archived under #model-release · RSS

r/LocalLLaMA community 2h ago

SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing.

I’m pretty jaded like most of y’all. I don’t really get excited by new models much anymore. Last few weeks have been kinda meh to be honest. Monday, I stumbled upon SenseNova’s Mixture of Transformers models and they seem kinda like a different animal than other typical image…

4
r/LocalLLaMA community 2h ago

They fit! Mostly.... 2x 3090, Thermaltake Core p3

Got another 3090 had to print a bracket to angle the radiator and make room for the GPUs 💀 ended up liking the look more than I thought ..qwen 27b go brrrrr   submitted by   /u/anthonyg45157 [link]   [comments]

6
Hugging Face Daily Papers research 4h ago

MemLearner: Learning to Query Context memory for Video World Models

Abstract MemLearner improves video world models by using learning-based adaptive context querying with query tokens to enhance scene consistency and memory in long video sequences with occlusions and dynamic objects. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Video World…

24
Don't Worry About the Vase community 7h ago

Claude Sonnet 5 Is Not Frontier But Has Its Uses

Fable 5 is back today, baby! Premium subscribers have one week to use it within their subscriptions. First hit’s free. Then you pay by the token.

18
r/MachineLearning community 9h ago

New PyMuPDF release, supports Markdown [N]

https://pymupdf.io/blog/markdown-in-pymupdf-1-28 PyMuPDF 1.28 release, introduces Markdown as a first class document in PyMuPDF. Seems useful for a variety of workflows. You can create PDFs from Markdown text with control over appearance using CSS   submitted by  …

9
r/LocalLLaMA community 11h ago

What should I test when comparing Qwen3.6-27b quants for real world effects that humans could reason about?

I tried to find some good comparisons on how different quants of Qwen3.6-27b perform in different scenarios, but I failed to find good information on what kinds of real world effects there are to running different quants like Q4_K_M, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_XL and…

37
TechCrunch — AI news-outlet 11h ago

Ashton Kutcher leaving Sound Ventures to launch new VC firm with Morgan Beller

The actor and investor is joining forces with Morgan Beller, who was previously a GP at NFX, to invest in early-stage startups.

25
Hugging Face Daily Papers research 11h ago

TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning

Abstract TRIAGE introduces a role-typed credit assignment framework that enhances agentic reinforcement learning by providing more nuanced credit assignment than standard GRPO methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Agentic reinforcement learning requires assigning…

26
r/LocalLLaMA community 12h ago

Llama-b9856 Win Cuda 12.4 - Windows Defender claims it's a trojan

Hi, just downloaded this release earlier today. Attempted to run llama-server, and Windows Defender shut it down. It says it's Wacatac.H!ml. It removed the llama-server-impl.dll file from the folder. Older releases work fine   submitted by   /u/Far_Course2496 [link]…

10
r/LocalLLaMA community 12h ago

Deepseek Flash V4 at IQ2 or Qwen 3.6 27B Q5KM ? Any tests or benchmarks ?

Deepseek Flash V4 at IQ2 or Qwen 3.6 27B Q5KM ? Any tests or benchmarks ? Wondering which one would be better at speed / coding / reasoning   submitted by   /u/soyalemujica [link]   [comments]

32
Hugging Face Daily Papers research 12h ago

SWE-INTERACT: Reimagining SWE Benchmarks as User-Driven Long-Horizon Coding Sessions

Abstract SWE-Interact presents a testbed that evaluates coding agents in realistic multi-turn, user-driven software engineering scenarios, revealing significant gaps between single-turn performance and interactive task completion. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We…

6
r/LocalLLaMA community 13h ago

Plurality Released: fully Free and Open Source AI agents/chatbot platform for local AI

Hello everyone! Some of you might recognize my user from the work I have done on Cosmos Cloud, but today I am here to talk to you about an entirely different project: Plurality. https://github.com/azukaar/plurality Plurality has been in development for a bit more than a year and…

22
r/LocalLLaMA community 13h ago

How to improve RAM offload?

I have only 12GB VRAM (RTX3060) but have enough RAM to run Qwen3.6 27B Q4 with offload. Something tells me that it won't achieve maximum performance but why DRAM speed is only around 30GB/s (HWiNFO data) during inference with dual channel 5200 RAM? TG is 3.12 tok/sec with 18K…

38
Hugging Face Daily Papers research 13h ago

Hierarchical Experimentalist Agents

Abstract HExA enables large language models to improve through active experimentation and skill learning in novel domains without requiring training or external supervision. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models (LLMs) are increasingly used to take…

24
Ars Technica — AI news-outlet 13h ago

After spooking Trump into safety testing, Anthropic AI models get global release

US lifts curbs on Anthropic’s advanced Fable and Mythos models.

31
Hugging Face Daily Papers research 14h ago

Does VLA Even Know the Basics? Measuring Commonsense and World Knowledge Retention in Vision-Language-Action Models

Abstract Act2Answer protocol evaluates embodied vision-language-action models by having agents answer questions through physical actions, revealing knowledge retention and generalization patterns across different semantic categories. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

35
r/LocalLLaMA community 14h ago

The gap between closed and open models might be much smaller than commonly assumed, because we don’t know what closed model providers do *in addition to* model inference

When Claude dominates GLM-5.2 in benchmarks, it’s usually assumed that Anthropic has superior model architectures, superior training pipelines, and other advanced machine learning techniques that make their models better than the competition. But actually, this doesn’t follow.…

10
r/LocalLLaMA community 15h ago

SWE-rebench leaderboard update: GLM-5.2, Qwen3.6-27B, Qwen3.6-35B-A3B, Gemma 4 31B and more + improved UI

Hi all, We made several updates to the SWE-rebench leaderboard: added new models, refreshed recent results, and reworked the leaderboard UI to make results easier to read, compare, and understand. New Models: Claude Opus 4.8 xhigh: 56.5% — 2.48M tokens GLM-5.2: 51.1% — 2.62M…

16
Latent.Space news-outlet 15h ago

🔬 The Coolest Diffusion Research Isn't in LLMs — Evan Feinberg & Sergey Edunov, Genesis Molecular AI

Why the Llama lead left Meta for drug discovery, PEARL's zero-shot OpenBind win, and what becomes possible when co-folding finally crosses the accuracy threshold.

10
TechCrunch — AI news-outlet 16h ago

Gemini Spark, Google’s agentic assistant, is now available on Mac

Google's 24/7 agentic assistant, Gemini Spark, comes to Mac alongside other improvements, like real-time tracking and support for more apps.

35
r/LocalLLaMA community 16h ago

Deepseek V4 Flash 2, 3 and 4 bits GGUFs

  submitted by   /u/tarruda [link]   [comments]

31
r/LocalLLaMA community 16h ago

Best tps can I get with Qwen3.5 122B on 32GB VRAM + 64GB RAM?

My attempt at running Qwen3.5 122B on my 5090 (32GB VRAM) + 64GB RAM is really bleak. I'm getting a speed that starts at 6 tps and ends at ~20 tps. Can I improve this further? build/bin/llama-server \ -m…

21
Hugging Face Daily Papers research 17h ago

Goku: A Million-Scale Universal Dataset and Benchmark for Instruction-Based Video Editing

Abstract A large-scale video editing dataset and model are introduced that support multi-task and structural manipulations through advanced data synthesis and network architectures. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing instruction-based video editing datasets…

38
r/LocalLLaMA community 17h ago

Non Us Ally should be afraid.

Spyware-like code in Claude Code that covertly targets Chinese users.   submitted by   /u/zakadit [link]   [comments]

28
Hacker News — AI on Front Page community 18h ago

Box3D, an open source 3D physics engine

Article URL: https://box2d.org/posts/2026/06/announcing-box3d/ Comments URL: https://news.ycombinator.com/item?id=48745445 Points: 246 # Comments: 47

12
Hugging Face Daily Papers research 18h ago

FlexiSLM: A Dynamic and Controllable Frame Rate Spoken Language Model

Abstract Flexible Spoken Language Model (FlexiSLM) introduces dynamic frame rate capabilities for speech input and output, achieving superior performance over fixed-frame-rate models while enabling controllable inference speed. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Spoken…

15
Hugging Face Daily Papers research 19h ago

Managing Procedural Memory in LLM Agents: Control, Adaptation, and Evaluation

Abstract Procedural memory enhances LLM agents on workplace tasks through skill transfer across roles and models, with varying generalization capabilities affecting deployment strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Procedural memory is increasingly used to…

22
r/LocalLLaMA community 20h ago

Why can i never stop the looping?

I constantly see people here saying Qwen3.6 35B is amazing, Ornith V1 is amazing, but i cannot use these models at all without severe looping problems. What the hell am i doing wrong?? Temp 0.6 top_p 0.95 top_k 20 min_p 0.05 rep_penalty 1.1 Using Q6 of both models with K/V at…

35
Hugging Face Daily Papers research 21h ago

SkillHone: A Harness for Continual Agent Skill Evolution Through Persistent Decision History

Abstract SkillHone enables continuous evolution of agent skills by maintaining persistent decision histories and incorporating practice feedback for improved performance across research and tool-mediated analysis tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Agent skills…

35
Hugging Face Daily Papers research 21h ago

DataEvolver: Self-Evolving Multi-Agent Data Construction for Text-Rich Image Generation

Abstract DataEvolver is a self-evolving multi-agent framework that improves text-rich image generation by leveraging feedback from rejected samples to iteratively enhance data quality. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Text-rich image generation is one of the most…

11
Hugging Face Daily Papers research 21h ago

Scenes as Objects, Not Primitives: Instance-Structured 3D Tokenization from Unposed Views

Abstract A feed-forward framework decomposes 3D scenes into instance-structured token groups from multi-view images, enabling direct object-level reconstruction, segmentation, and manipulation without 3D annotations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct A 3D scene is…

38
Hugging Face Daily Papers research 21h ago

RedVox: Safety and Fairness Gaps in Speech Models Across Languages

Abstract Multilingual safety and fairness benchmark for speech models reveals persistent vulnerabilities across languages and naturalistic conditions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Speech-capable models are increasingly deployed in real-world applications across…

36
Hugging Face Daily Papers research 22h ago

Xiaomi-GUI-0 Technical Report

Abstract A native multimodal GUI agent trained in real-device environments demonstrates superior performance and stability compared to traditional benchmark-based approaches. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Graphical user interface (GUI) agents build on…

7
Vercel — AI dev-tools 23h ago

Claude Fable 5 access restored on AI Gateway

Access to Claude Fable 5, the Mythos-class model, has now been restored on AI Gateway following the US Government's decision to lift the export controls. Fable 5 is the same model that was available between June 9 and June 12. What has changed is the safety classifiers, which…

27
r/LocalLLaMA community 1d ago

Ketch - Best Search Tool for local models

recently I wrote a blog post, to find which search tool will be best for the pi coding agent paired with local models (currently I use Qwen3.6 35B) Before that I were using firecrawl or brave-search, but found them very decent, so I went to SearXNG, which is fine, but lacks some…

38
Hugging Face Daily Papers research 1d ago

Little Brains, Big Feats: Exploring Compact Language Models

Abstract Small language models can effectively perform retrieval-augmented generation tasks directly on-device without GPU acceleration. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While large language models have been dominating the research landscape recently, small language…

13
Hugging Face Daily Papers research 1d ago

Multi-Block Diffusion Language Models

Abstract Multi-Block Diffusion Language Models extend single-block diffusion to concurrent block decoding with improved training strategies and optimized decoding algorithms. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Block Diffusion Language Models (BD-LMs) improve…

35
Hugging Face Daily Papers research 1d ago

Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs

Abstract Reinforcement learning with metacognitive feedback and metacognitive data selection improve large language model calibration by enabling accurate self-assessment of performance and uncertainty. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Metacognition is a critical…

38
Hugging Face Daily Papers research 1d ago

TerraDiT-Ω: Unified Spatial Control for Satellite Image Synthesis with Any Geospatial Primitive

Abstract TerraDiT-Ω generates satellite imagery from native geospatial primitives using Geometry-Aware Local Attention, enabling flexible conditioning and improved downstream geospatial tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Generative models have achieved…

36
arXiv — Machine Learning research 1d ago

Knowledge Distillation from Large Reasoning Models to Compact Student Models: A Case Study on the John O Bryan Mathematics Competition

arXiv:2606.31048v1 Announce Type: new Abstract: This paper investigates knowledge distillation from a large reasoning model (DeepSeek-R1) to a compact student model (Qwen2.5-7B). Using historical problems from the John O'Bryan Mathematics Competition at Northern Kentucky…

7
arXiv — Machine Learning research 1d ago

BEST-RQ-2: Contextualize-Then-Predict, a Two-Step Approach for Self-Supervised Audio Representations

arXiv:2606.30700v1 Announce Type: cross Abstract: Self-supervised learning enables audio representations that transfer across domains and tasks. We present BEST-RQ-2, an evolution of BEST-RQ that retains frozen randomprojection-based discrete targets while introducing a two-step…

19
Hugging Face Daily Papers research 1d ago

BlockPilot: Instance-Adaptive Policy Learning for Diffusion-based Speculative Decoding

Abstract Speculative decoding with adaptive block size selection improves inference efficiency by predicting optimal block sizes from prefilling representations, achieving significant speedup with minimal overhead. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Speculative…

30
Latent.Space news-outlet 1d ago

[AINews] Sonnet 5 today, and Fable 5 tomorrow

Everything is open again!

7
Hugging Face Daily Papers research 1d ago

AVTok: 1D Unified Tokenization for Holistic Audio-Video Generation

Abstract AVTok is a unified tokenizer for audio-video generation that uses a dual-stream transformer architecture with shared encoder-decoder and modal-specific queries to create compact one-dimensional latent representations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

21
TechCrunch — AI news-outlet 1d ago

Wayve launches $85M employee tender offer at $8.5B valuation

Wayve’s offering is part of a growing trend of AI startups using employee tenders as a strategic tool to attract and retain talent.

31
Hugging Face Daily Papers research 1d ago

Dockerless: Environment-Free Program Verifier for Coding Agents

Abstract A Dockerless environment-free agentic patch verifier improves code patch evaluation accuracy and enables effective post-training without execution-based verification costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Program verifiers play a central role in training…

21
r/LocalLLaMA community 1d ago

Biggest, baddest model to fill 144GB VRAM + 120GB RAM to the brim, regardless of speed

I'm trying to round out my quiver of daily driver models for my personal harness. Right now I drive qwen3.6 27b for balanced code and gemma4 31b for human interaction with lots of context and a few parallel sessions. Minimax M2.7 at Q6 clocks in at 207gb base and just barely…

5
r/LocalLLaMA community 1d ago

[audio.cpp] VibeVoice 1.5B released — 90-min podcast in 22.95 min, 4.08x real-time, 2.86x faster than Python without quantization. Native C++/ggml

I’m the author of audio.cpp, a C++/ggml runtime for local audio models. I just added VibeVoice 1.5B support and wanted to share the benchmark because long-form multi-speaker TTS is a good stress test for local inference runtimes. Result on RTX 5090: VibeVoice 1.5B Audio length:…

26
Hugging Face Daily Papers research 1d ago

OSWorld2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks

Abstract OSWorld 2.0 presents a comprehensive benchmark for evaluating computer-use agents through complex, real-world workflows that reveal current limitations in agent reasoning and task completion. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing computer-use benchmarks…

24
r/LocalLLaMA community 1d ago

Claude Code Is Steganographically Marking Requests

  submitted by   /u/johnnyApplePRNG [link]   [comments]

21

SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing.

They fit! Mostly.... 2x 3090, Thermaltake Core p3

MemLearner: Learning to Query Context memory for Video World Models

Claude Sonnet 5 Is Not Frontier But Has Its Uses

New PyMuPDF release, supports Markdown [N]

What should I test when comparing Qwen3.6-27b quants for real world effects that humans could reason about?

Ashton Kutcher leaving Sound Ventures to launch new VC firm with Morgan Beller

TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning

Llama-b9856 Win Cuda 12.4 - Windows Defender claims it's a trojan

Deepseek Flash V4 at IQ2 or Qwen 3.6 27B Q5KM ? Any tests or benchmarks ?

SWE-INTERACT: Reimagining SWE Benchmarks as User-Driven Long-Horizon Coding Sessions

Plurality Released: fully Free and Open Source AI agents/chatbot platform for local AI

How to improve RAM offload?

Hierarchical Experimentalist Agents

After spooking Trump into safety testing, Anthropic AI models get global release

Does VLA Even Know the Basics? Measuring Commonsense and World Knowledge Retention in Vision-Language-Action Models

The gap between closed and open models might be much smaller than commonly assumed, because we don’t know what closed model providers do *in addition to* model inference

SWE-rebench leaderboard update: GLM-5.2, Qwen3.6-27B, Qwen3.6-35B-A3B, Gemma 4 31B and more + improved UI

🔬 The Coolest Diffusion Research Isn't in LLMs — Evan Feinberg & Sergey Edunov, Genesis Molecular AI

Gemini Spark, Google&#8217;s agentic assistant, is now available on Mac

Deepseek V4 Flash 2, 3 and 4 bits GGUFs

Best tps can I get with Qwen3.5 122B on 32GB VRAM + 64GB RAM?

Goku: A Million-Scale Universal Dataset and Benchmark for Instruction-Based Video Editing

Non Us Ally should be afraid.

Box3D, an open source 3D physics engine

FlexiSLM: A Dynamic and Controllable Frame Rate Spoken Language Model

Managing Procedural Memory in LLM Agents: Control, Adaptation, and Evaluation

Why can i never stop the looping?

SkillHone: A Harness for Continual Agent Skill Evolution Through Persistent Decision History

DataEvolver: Self-Evolving Multi-Agent Data Construction for Text-Rich Image Generation

Scenes as Objects, Not Primitives: Instance-Structured 3D Tokenization from Unposed Views

RedVox: Safety and Fairness Gaps in Speech Models Across Languages

Xiaomi-GUI-0 Technical Report

Claude Fable 5 access restored on AI Gateway

Ketch - Best Search Tool for local models

Little Brains, Big Feats: Exploring Compact Language Models

Multi-Block Diffusion Language Models

Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs

TerraDiT-Ω: Unified Spatial Control for Satellite Image Synthesis with Any Geospatial Primitive

Knowledge Distillation from Large Reasoning Models to Compact Student Models: A Case Study on the John O Bryan Mathematics Competition

BEST-RQ-2: Contextualize-Then-Predict, a Two-Step Approach for Self-Supervised Audio Representations

BlockPilot: Instance-Adaptive Policy Learning for Diffusion-based Speculative Decoding

[AINews] Sonnet 5 today, and Fable 5 tomorrow

AVTok: 1D Unified Tokenization for Holistic Audio-Video Generation

Wayve launches $85M employee tender offer at $8.5B valuation

Dockerless: Environment-Free Program Verifier for Coding Agents

Biggest, baddest model to fill 144GB VRAM + 120GB RAM to the brim, regardless of speed

[audio.cpp] VibeVoice 1.5B released — 90-min podcast in 22.95 min, 4.08x real-time, 2.86x faster than Python without quantization. Native C++/ggml

OSWorld2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks

Claude Code Is Steganographically Marking Requests

The gap between closed and open models might be much smaller than commonly assumed, because we don’t know what closed model providers do in addition to model inference

Gemini Spark, Google’s agentic assistant, is now available on Mac