News / #model-release Tag Model releases 500 articles archived under #model-release · RSS Sign in to follow r/LocalLLaMA community 2h ago SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing. I’m pretty jaded like most of y’all. I don’t really get excited by new models much anymore. Last few weeks have been kinda meh to be honest. Monday, I stumbled upon SenseNova’s Mixture of Transformers models and they seem kinda like a different animal than other typical image… 4 r/LocalLLaMA community 2h ago They fit! Mostly.... 2x 3090, Thermaltake Core p3 Got another 3090 had to print a bracket to angle the radiator and make room for the GPUs 💀 ended up liking the look more than I thought ..qwen 27b go brrrrr   submitted by   /u/anthonyg45157 [link]   [comments] 6 Hugging Face Daily Papers research 4h ago MemLearner: Learning to Query Context memory for Video World Models Abstract MemLearner improves video world models by using learning-based adaptive context querying with query tokens to enhance scene consistency and memory in long video sequences with occlusions and dynamic objects. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Video World… 24 Don't Worry About the Vase community 7h ago Claude Sonnet 5 Is Not Frontier But Has Its Uses Fable 5 is back today, baby! Premium subscribers have one week to use it within their subscriptions. First hit’s free. Then you pay by the token. 18 r/MachineLearning community 9h ago New PyMuPDF release, supports Markdown [N] https://pymupdf.io/blog/markdown-in-pymupdf-1-28 PyMuPDF 1.28 release, introduces Markdown as a first class document in PyMuPDF. Seems useful for a variety of workflows. You can create PDFs from Markdown text with control over appearance using CSS   submitted by  … 9 r/LocalLLaMA community 11h ago What should I test when comparing Qwen3.6-27b quants for real world effects that humans could reason about? I tried to find some good comparisons on how different quants of Qwen3.6-27b perform in different scenarios, but I failed to find good information on what kinds of real world effects there are to running different quants like Q4_K_M, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_XL and… 37 TechCrunch — AI news-outlet 11h ago Ashton Kutcher leaving Sound Ventures to launch new VC firm with Morgan Beller The actor and investor is joining forces with Morgan Beller, who was previously a GP at NFX, to invest in early-stage startups. 25 Hugging Face Daily Papers research 11h ago TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning Abstract TRIAGE introduces a role-typed credit assignment framework that enhances agentic reinforcement learning by providing more nuanced credit assignment than standard GRPO methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Agentic reinforcement learning requires assigning… 26 r/LocalLLaMA community 12h ago Llama-b9856 Win Cuda 12.4 - Windows Defender claims it's a trojan Hi, just downloaded this release earlier today. Attempted to run llama-server, and Windows Defender shut it down. It says it's Wacatac.H!ml. It removed the llama-server-impl.dll file from the folder. Older releases work fine   submitted by   /u/Far_Course2496 [link]… 10 r/LocalLLaMA community 12h ago Deepseek Flash V4 at IQ2 or Qwen 3.6 27B Q5KM ? Any tests or benchmarks ? Deepseek Flash V4 at IQ2 or Qwen 3.6 27B Q5KM ? Any tests or benchmarks ? Wondering which one would be better at speed / coding / reasoning   submitted by   /u/soyalemujica [link]   [comments] 32 Hugging Face Daily Papers research 12h ago SWE-INTERACT: Reimagining SWE Benchmarks as User-Driven Long-Horizon Coding Sessions Abstract SWE-Interact presents a testbed that evaluates coding agents in realistic multi-turn, user-driven software engineering scenarios, revealing significant gaps between single-turn performance and interactive task completion. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We… 6 r/LocalLLaMA community 13h ago Plurality Released: fully Free and Open Source AI agents/chatbot platform for local AI Hello everyone! Some of you might recognize my user from the work I have done on Cosmos Cloud, but today I am here to talk to you about an entirely different project: Plurality. https://github.com/azukaar/plurality Plurality has been in development for a bit more than a year and… 22 r/LocalLLaMA community 13h ago How to improve RAM offload? I have only 12GB VRAM (RTX3060) but have enough RAM to run Qwen3.6 27B Q4 with offload. Something tells me that it won't achieve maximum performance but why DRAM speed is only around 30GB/s (HWiNFO data) during inference with dual channel 5200 RAM? TG is 3.12 tok/sec with 18K… 38 Hugging Face Daily Papers research 13h ago Hierarchical Experimentalist Agents Abstract HExA enables large language models to improve through active experimentation and skill learning in novel domains without requiring training or external supervision. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models (LLMs) are increasingly used to take… 24 Ars Technica — AI news-outlet 13h ago After spooking Trump into safety testing, Anthropic AI models get global release US lifts curbs on Anthropic’s advanced Fable and Mythos models. 31 Hugging Face Daily Papers research 14h ago Does VLA Even Know the Basics? Measuring Commonsense and World Knowledge Retention in Vision-Language-Action Models Abstract Act2Answer protocol evaluates embodied vision-language-action models by having agents answer questions through physical actions, revealing knowledge retention and generalization patterns across different semantic categories. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 35 r/LocalLLaMA community 14h ago The gap between closed and open models might be much smaller than commonly assumed, because we don’t know what closed model providers do *in addition to* model inference When Claude dominates GLM-5.2 in benchmarks, it’s usually assumed that Anthropic has superior model architectures, superior training pipelines, and other advanced machine learning techniques that make their models better than the competition. But actually, this doesn’t follow.… 10 r/LocalLLaMA community 15h ago SWE-rebench leaderboard update: GLM-5.2, Qwen3.6-27B, Qwen3.6-35B-A3B, Gemma 4 31B and more + improved UI Hi all, We made several updates to the SWE-rebench leaderboard: added new models, refreshed recent results, and reworked the leaderboard UI to make results easier to read, compare, and understand. New Models: Claude Opus 4.8 xhigh: 56.5% — 2.48M tokens GLM-5.2: 51.1% — 2.62M… 16 Latent.Space news-outlet 15h ago 🔬 The Coolest Diffusion Research Isn't in LLMs — Evan Feinberg & Sergey Edunov, Genesis Molecular AI Why the Llama lead left Meta for drug discovery, PEARL's zero-shot OpenBind win, and what becomes possible when co-folding finally crosses the accuracy threshold. 10 TechCrunch — AI news-outlet 16h ago Gemini Spark, Google’s agentic assistant, is now available on Mac Google's 24/7 agentic assistant, Gemini Spark, comes to Mac alongside other improvements, like real-time tracking and support for more apps. 35 r/LocalLLaMA community 16h ago Deepseek V4 Flash 2, 3 and 4 bits GGUFs   submitted by   /u/tarruda [link]   [comments] 31 r/LocalLLaMA community 16h ago Best tps can I get with Qwen3.5 122B on 32GB VRAM + 64GB RAM? My attempt at running Qwen3.5 122B on my 5090 (32GB VRAM) + 64GB RAM is really bleak. I'm getting a speed that starts at 6 tps and ends at ~20 tps. Can I improve this further? build/bin/llama-server \ -m… 21 Hugging Face Daily Papers research 17h ago Goku: A Million-Scale Universal Dataset and Benchmark for Instruction-Based Video Editing Abstract A large-scale video editing dataset and model are introduced that support multi-task and structural manipulations through advanced data synthesis and network architectures. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing instruction-based video editing datasets… 38 r/LocalLLaMA community 17h ago Non Us Ally should be afraid. Spyware-like code in Claude Code that covertly targets Chinese users.   submitted by   /u/zakadit [link]   [comments] 28 Hacker News — AI on Front Page community 18h ago Box3D, an open source 3D physics engine Article URL: https://box2d.org/posts/2026/06/announcing-box3d/ Comments URL: https://news.ycombinator.com/item?id=48745445 Points: 246 # Comments: 47 12 Hugging Face Daily Papers research 18h ago FlexiSLM: A Dynamic and Controllable Frame Rate Spoken Language Model Abstract Flexible Spoken Language Model (FlexiSLM) introduces dynamic frame rate capabilities for speech input and output, achieving superior performance over fixed-frame-rate models while enabling controllable inference speed. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Spoken… 15 Hugging Face Daily Papers research 19h ago Managing Procedural Memory in LLM Agents: Control, Adaptation, and Evaluation Abstract Procedural memory enhances LLM agents on workplace tasks through skill transfer across roles and models, with varying generalization capabilities affecting deployment strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Procedural memory is increasingly used to… 22 r/LocalLLaMA community 20h ago Why can i never stop the looping? I constantly see people here saying Qwen3.6 35B is amazing, Ornith V1 is amazing, but i cannot use these models at all without severe looping problems. What the hell am i doing wrong?? Temp 0.6 top_p 0.95 top_k 20 min_p 0.05 rep_penalty 1.1 Using Q6 of both models with K/V at… 35 Hugging Face Daily Papers research 21h ago SkillHone: A Harness for Continual Agent Skill Evolution Through Persistent Decision History Abstract SkillHone enables continuous evolution of agent skills by maintaining persistent decision histories and incorporating practice feedback for improved performance across research and tool-mediated analysis tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Agent skills… 35 Hugging Face Daily Papers research 21h ago DataEvolver: Self-Evolving Multi-Agent Data Construction for Text-Rich Image Generation Abstract DataEvolver is a self-evolving multi-agent framework that improves text-rich image generation by leveraging feedback from rejected samples to iteratively enhance data quality. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Text-rich image generation is one of the most… 11 Hugging Face Daily Papers research 21h ago Scenes as Objects, Not Primitives: Instance-Structured 3D Tokenization from Unposed Views Abstract A feed-forward framework decomposes 3D scenes into instance-structured token groups from multi-view images, enabling direct object-level reconstruction, segmentation, and manipulation without 3D annotations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct A 3D scene is… 38 Hugging Face Daily Papers research 21h ago RedVox: Safety and Fairness Gaps in Speech Models Across Languages Abstract Multilingual safety and fairness benchmark for speech models reveals persistent vulnerabilities across languages and naturalistic conditions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Speech-capable models are increasingly deployed in real-world applications across… 36 Hugging Face Daily Papers research 22h ago Xiaomi-GUI-0 Technical Report Abstract A native multimodal GUI agent trained in real-device environments demonstrates superior performance and stability compared to traditional benchmark-based approaches. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Graphical user interface (GUI) agents build on… 7 Vercel — AI dev-tools 23h ago Claude Fable 5 access restored on AI Gateway Access to Claude Fable 5, the Mythos-class model, has now been restored on AI Gateway following the US Government's decision to lift the export controls. Fable 5 is the same model that was available between June 9 and June 12. What has changed is the safety classifiers, which… 27 r/LocalLLaMA community 1d ago Ketch - Best Search Tool for local models recently I wrote a blog post, to find which search tool will be best for the pi coding agent paired with local models (currently I use Qwen3.6 35B) Before that I were using firecrawl or brave-search, but found them very decent, so I went to SearXNG, which is fine, but lacks some… 38 Hugging Face Daily Papers research 1d ago Little Brains, Big Feats: Exploring Compact Language Models Abstract Small language models can effectively perform retrieval-augmented generation tasks directly on-device without GPU acceleration. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While large language models have been dominating the research landscape recently, small language… 13 Hugging Face Daily Papers research 1d ago Multi-Block Diffusion Language Models Abstract Multi-Block Diffusion Language Models extend single-block diffusion to concurrent block decoding with improved training strategies and optimized decoding algorithms. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Block Diffusion Language Models (BD-LMs) improve… 35 Hugging Face Daily Papers research 1d ago Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs Abstract Reinforcement learning with metacognitive feedback and metacognitive data selection improve large language model calibration by enabling accurate self-assessment of performance and uncertainty. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Metacognition is a critical… 38 Hugging Face Daily Papers research 1d ago TerraDiT-Ω: Unified Spatial Control for Satellite Image Synthesis with Any Geospatial Primitive Abstract TerraDiT-Ω generates satellite imagery from native geospatial primitives using Geometry-Aware Local Attention, enabling flexible conditioning and improved downstream geospatial tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Generative models have achieved… 36 arXiv — Machine Learning research 1d ago Knowledge Distillation from Large Reasoning Models to Compact Student Models: A Case Study on the John O Bryan Mathematics Competition arXiv:2606.31048v1 Announce Type: new Abstract: This paper investigates knowledge distillation from a large reasoning model (DeepSeek-R1) to a compact student model (Qwen2.5-7B). Using historical problems from the John O'Bryan Mathematics Competition at Northern Kentucky… 7 arXiv — Machine Learning research 1d ago BEST-RQ-2: Contextualize-Then-Predict, a Two-Step Approach for Self-Supervised Audio Representations arXiv:2606.30700v1 Announce Type: cross Abstract: Self-supervised learning enables audio representations that transfer across domains and tasks. We present BEST-RQ-2, an evolution of BEST-RQ that retains frozen randomprojection-based discrete targets while introducing a two-step… 19 Hugging Face Daily Papers research 1d ago BlockPilot: Instance-Adaptive Policy Learning for Diffusion-based Speculative Decoding Abstract Speculative decoding with adaptive block size selection improves inference efficiency by predicting optimal block sizes from prefilling representations, achieving significant speedup with minimal overhead. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Speculative… 30 Latent.Space news-outlet 1d ago [AINews] Sonnet 5 today, and Fable 5 tomorrow Everything is open again! 7 Hugging Face Daily Papers research 1d ago AVTok: 1D Unified Tokenization for Holistic Audio-Video Generation Abstract AVTok is a unified tokenizer for audio-video generation that uses a dual-stream transformer architecture with shared encoder-decoder and modal-specific queries to create compact one-dimensional latent representations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 21 TechCrunch — AI news-outlet 1d ago Wayve launches $85M employee tender offer at $8.5B valuation Wayve’s offering is part of a growing trend of AI startups using employee tenders as a strategic tool to attract and retain talent. 31 Hugging Face Daily Papers research 1d ago Dockerless: Environment-Free Program Verifier for Coding Agents Abstract A Dockerless environment-free agentic patch verifier improves code patch evaluation accuracy and enables effective post-training without execution-based verification costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Program verifiers play a central role in training… 21 r/LocalLLaMA community 1d ago Biggest, baddest model to fill 144GB VRAM + 120GB RAM to the brim, regardless of speed I'm trying to round out my quiver of daily driver models for my personal harness. Right now I drive qwen3.6 27b for balanced code and gemma4 31b for human interaction with lots of context and a few parallel sessions. Minimax M2.7 at Q6 clocks in at 207gb base and just barely… 5 r/LocalLLaMA community 1d ago [audio.cpp] VibeVoice 1.5B released — 90-min podcast in 22.95 min, 4.08x real-time, 2.86x faster than Python without quantization. Native C++/ggml I’m the author of audio.cpp, a C++/ggml runtime for local audio models. I just added VibeVoice 1.5B support and wanted to share the benchmark because long-form multi-speaker TTS is a good stress test for local inference runtimes. Result on RTX 5090: VibeVoice 1.5B Audio length:… 26 Hugging Face Daily Papers research 1d ago OSWorld2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks Abstract OSWorld 2.0 presents a comprehensive benchmark for evaluating computer-use agents through complex, real-world workflows that reveal current limitations in agent reasoning and task completion. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing computer-use benchmarks… 24 r/LocalLLaMA community 1d ago Claude Code Is Steganographically Marking Requests   submitted by   /u/johnnyApplePRNG [link]   [comments] 21 Page 1 of 10 · 500 articles Older →