Tag

Model releases

500 articles archived under #model-release · RSS

Hugging Face Daily Papers research 2d ago

TheoremGraph: Bridging Formal and Informal Mathematics

Abstract A unified mathematical dependency graph connects informal and formal mathematics through semantic embedding and automated extraction from arXiv papers and Lean projects. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Mathematical knowledge is organized around statements…

32
Hugging Face Daily Papers research 2d ago

Learning Transferable Dynamics Priors from Action to World Modeling

Abstract Action-conditioned world modeling enables transferable dynamics priors for robot learning through pretraining on large-scale manipulation data, supporting both simulator-based policy evaluation and video-action prediction. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We…

27
Hugging Face Daily Papers research 2d ago

The Surprising Effectiveness of Video Diffusion Models for Hand Motion Reconstruction

Abstract ViDiHand uses pretrained video diffusion model representations with hand-overlay rendering to reconstruct 4D hand motion directly from video frames without detectors or optimization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct 4D hand motion reconstruction from…

31
r/LocalLLaMA community 2d ago

Tesla V100 16GB local LLMs, single and dual NVLink benchmarks

Picked up a couple of Tesla V100-SXM2-16GB modules a while back to run local models and drive Claude Code fully offline, figured the actual numbers and the traps might save someone else the pain. They've come right down in price and the 16GB of HBM2 at ~900 GB/s still holds up…

33
Hugging Face Daily Papers research 2d ago

Interleaved Speech Language Models Latently Work In Text

Abstract Interleaved speech-text language models exhibit an implicit transcription phase where text tokens become decodable in intermediate layers, followed by text-based prediction before speech domain transformation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Speech language…

16
Smol AI News news-outlet 2d ago

not much happened today

**Anthropic** launched **Claude Sonnet 5** as its new default mid-tier frontier model, featuring a **1M-token context window**, enhanced agentic capabilities including planning, browser and terminal tool use, and autonomous execution previously requiring larger models. The model…

27
Hugging Face Daily Papers research 2d ago

Video-MME-Logical: A Controlled Diagnostic Benchmark for Video Temporal-Logical Reasoning

Abstract A new benchmark evaluates multimodal large language models' ability to reason over dynamic visual evidence through controlled temporal-logical operations rather than simple object recognition. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent interest in multimodal…

25
r/LocalLLaMA community 2d ago

Anyone using Gemma4:31b over Qwen3.6:27b or 35b(a10)

Using them in opencode. Mainly writing python scripts to set up workflows. I really do like Gemma4 even though it just sometimes doesn’t want to go the extra length. I really have to end up pushing it. It’s like really stubborn or something lol For both Qwen models, they’re…

17
Hugging Face Daily Papers research 2d ago

Trimming the Long-Tail of Visual World Modeling Evaluation

Abstract Current visual world models demonstrate limited generalization beyond common physical interactions, struggling with rare and irregular scenarios despite achieving realism on standard benchmarks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Physical interactions follow a…

28
arXiv — NLP / Computation & Language research 2d ago

Open but Incompatible: A License Compatibility Analysis of Corpora for Low-Resource African Languages

arXiv:2606.28867v1 Announce Type: new Abstract: Creative Commons licenses dominate African NLP corpus releases, but their compatibility rules are rarely applied. CC-BY-SA and CC-BY-NC cannot be combined in a single published dataset; a NoDerivs clause silently prohibits…

28
arXiv — NLP / Computation & Language research 2d ago

Fine-Tuning General-Purpose Large Language Models for Agricultural Applications:A Reproducible Framework and Evaluation Protocol Based on Qwen3-8B

arXiv:2606.28992v1 Announce Type: new Abstract: General-purpose large language models (LLMs) have demonstrated strong abilities in opendomain question answering, information extraction, and text generation. Agricultural applications, however, are domain-specific,…

20
arXiv — NLP / Computation & Language research 2d ago

Fast Numbers, Slow Language: Bridging Quantitative and Qualitative Earnings Signals

arXiv:2606.29734v1 Announce Type: new Abstract: Earnings announcements release two types of information sequentially: quantitative surprise (numeric earnings-per-share (EPS)/revenue versus analyst estimate) arrives first in press releases and financial news, processed by…

12
arXiv — NLP / Computation & Language research 2d ago

Are We Measuring Strategy or Phrasing? The Gap Between Surface- and Approach-Level Diversity in LLM Math Reasoning

arXiv:2606.29985v1 Announce Type: new Abstract: Diversity in LLM mathematical reasoning is critical for exploration, but common diversity metrics mostly capture surface-level variation rather than differences in how a problem is solved. We address this gap by introducing…

27
Hugging Face Daily Papers research 2d ago

LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing

Abstract A novel streaming video editing framework enables causal, frame-by-frame editing with stable long-horizon preservation and real-time responsiveness through a three-stage distillation pipeline and AR-oriented mask cache. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

24
Hugging Face Daily Papers research 2d ago

Geometric Stability of Neural Population Codes: Regional Variation, Behavioral Relevance, and Circuit Dependence

Abstract Geometric stability measures the consistency of pairwise stimulus distances across trials, revealing a distinct aspect of neural representation that differs from temporal stability and decoding accuracy. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Current models of…

27
Hugging Face Daily Papers research 2d ago

Walking in the Implicit: Interactive World Exploration via Neural Scene Representation

Abstract NeuWorld enables efficient interactive video generation by representing scenes as compact neural implicit states and using a transformer VAE with diffusion transformer for trajectory-conditioned rendering. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Interactive video…

25
Hugging Face Daily Papers research 2d ago

SafePyramid: A Hierarchical Benchmark for In-context Policy Guardrailing

Abstract SafePyramid benchmark evaluates guardrail systems' ability to identify safety violations through in-context policy specification across multiple domains and complexity levels. Generated by Qwen/Qwen2.5-Coder-32B-Instruct In real-world applications, guardrails are often…

5
Hugging Face Daily Papers research 2d ago

PoseShield: Neural Collision Fields for Human Self-Collision Resolution

Abstract PoseShield addresses self-collision issues in SMPL-based human pose estimation by applying neural collision constraints in pose space through constrained optimization and Eikonal regularization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Self-collision remains a…

15
Hugging Face Daily Papers research 2d ago

Orca: The World is in Your Mind

Abstract Orca establishes a unified world latent space through next-state-prediction modeling using multimodal data and demonstrates superior performance in downstream tasks compared to specialized baselines. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We introduce Orca, an…

38
Hugging Face Daily Papers research 2d ago

ReFreeKV: Towards Threshold-Free KV Cache Compression

Abstract ReFreeKV addresses the limitations of threshold-dependent KV cache pruning by introducing a threshold-free approach that adaptively allocates compression budgets while maintaining full-cache performance across diverse datasets and model sizes. Generated by…

31
Hugging Face Daily Papers research 2d ago

Monte Carlo Energy Aggregation for Mobile 3D Gaussian Splatting

Abstract Flux-GS enables real-time high-fidelity 3D Gaussian Splatting on mobile platforms through efficient lighting representation, attribute-conditioned enhancement, and multi-view densification strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent advances in 3D…

10
Hugging Face Daily Papers research 2d ago

Nemotron-Labs-Diffusion-Image: Advancing Masked Discrete Diffusion for High-Resolution Image Synthesis

Abstract A masked discrete diffusion model for text-to-image synthesis that addresses limitations in token refinement and training efficiency through novel mechanisms and optimizations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We propose Nemotron-Labs-Diffusion-Image, a…

25
Hugging Face Daily Papers research 2d ago

PolicyGuard: A Dialogue-Grounded Sub-Agent Verifier for Policy Adherence in LLM Agents

Abstract POLICYGUARD is a sub-agent verifier that enhances LLM agent policy adherence by providing contextual reasoning and conversation-specific feedback across multi-turn interactions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct LLM agents handle user requests on behalf of…

11
TechCrunch — AI news-outlet 2d ago

Vibe coding platform Base44 launches own model as AI startups seek defensibility

Wix-owned vibe coding platform Base44 has started rolling out its own AI model — with hopes that it will eventually outperform frontier models.

8
Hugging Face Daily Papers research 2d ago

GUICrafter: Weakly-Supervised GUI Agent Leveraging Massive Unannotated Screenshots

Abstract GUICrafter addresses GUI agent data challenges through a weakly-supervised approach using unannotated screenshots and a two-stage curriculum learning framework for visual grounding and reinforcement learning calibration. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

10
Hugging Face Daily Papers research 2d ago

Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction

Abstract Epi2Diff framework transforms LRM reasoning traces into cognitive episodes to predict human item difficulty more accurately than existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Predicting human item difficulty is central to educational assessment, where…

8
Hugging Face Daily Papers research 2d ago

MIMFlow: Integrating Masked Image Modeling with Normalizing Flows for End-to-End Image Generation

Abstract MIMFlow combines Normalizing Flows with Masked Image Modeling to improve generative modeling by decoupling semantic representation from pixel-level details, achieving better performance with fewer tokens. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Normalizing Flows…

37
r/LocalLLaMA community 2d ago

How I'm using local models from real-world coding

Just want to share since after many attempts over the past year, I finally have a setup I kinda like and does useful work for me. I only have 32GB of RAM and a 4070 8GB (laptop), just very ordinary hardware. I found that Qwen3.6-35B-A3B runs reliably at about 15 tokens per…

25
r/LocalLLaMA community 2d ago

Been running Qwen3.6-27B through a 3-critic harness. The harness matters more than I thought

Been running Qwen3.6-27B (8-bit) through my coding harness for a few days, alongside GLM5.2. The harness uses 3 critics — code review, test review, Playwright e2e — each with fresh context before accepting output. Qwen3.6 is legit for a 27B dense model. Benchmarks weren't lying.…

19
Vercel — AI dev-tools 2d ago

Run multiple frameworks in one project with Vercel Services

You can now deploy multiple frontends and backends together within a single Vercel project. Vercel Services is now available , allowing you to deploy full stack apps with multiple frameworks on a shared domain, where services talk to each other privately and deployments build,…

29
Vercel — AI dev-tools 2d ago

Introducing VCR: Vercel Container Registry

You can now push, pull, and manage container images directly on Vercel. Vercel Container Registry is an OCI-compliant image registry hosted on Vercel's infrastructure. It works with standard workflows - simply docker push , docker pull , and docker tag - so there's nothing new…

37
Vercel — AI dev-tools 2d ago

Vercel Sandbox now support Custom Images

Vercel Sandboxes now supports custom images. Launching in public beta today, images allow Sandboxes to start with your own custom root filesystem. Images are pulled from Vercel Container Registry , so anything you docker push is immediately available. Bring your own OS,…

21
Vercel — AI dev-tools 2d ago

Nano Banana 2 Lite (Gemini 3.1 Flash Lite Image) now on AI Gateway

Nano Banana 2 Lite from Google is now available on AI Gateway . This Flash-Lite-tier image model is built for fast, low-cost generation. It generates images alongside text in The cost is also lower than previous Nano Banana models. Nano Banana 2 Lite generates 1K images at…

17
OpenAI official-blog 2d ago

Introducing GeneBench-Pro

Introducing GeneBench-Pro, a new benchmark testing AI performance in genomics, biology, and scientific research using complex, real-world datasets.

22
Vercel — AI dev-tools 2d ago

Claude Sonnet 5 now available on Vercel AI Gateway

Claude Sonnet 5 from Anthropic is now available on AI Gateway . Sonnet 5 improves on Sonnet 4.6 across coding and agentic work, reaching outcomes on many tasks that previously needed an Opus model, at Sonnet pricing. The model is more agentic and follows instructions more…

14
Vercel — AI dev-tools 2d ago

Vercel Private Blob is now generally available

Vercel Private Blob is now generally available for all plans. Store sensitive files like user-uploaded photos, invoices, and agent memory, and control exactly who can read them. Private stores, Signed URLs, and OIDC authentication all graduate from beta with this release. Vercel…

22
Vercel — AI dev-tools 2d ago

An expanded Vercel Agent: chat, investigations, and approved actions, now in public beta

Today, we're launching expanded capabilities for Vercel Agent in public beta. Vercel Agent now lives in your dashboard and can investigate production issues, answer questions about your projects, and take action on your behalf. Because Agent runs inside the platform that deploys…

27
r/LocalLLaMA community 2d ago

Introducing LongCat-2.0 - , a large-scale MoE language model with 1.6 trillion total parameters and ~48 billion activated per token. This was the stealth model that was on Openrouter under the name 'owl-alpha'.

  submitted by   /u/AnticitizenPrime [link]   [comments]

18
r/LocalLLaMA community 2d ago

Ornith 35B works reasonably well with Qwen3.6 35B DFlash speculative model

I saw a solid 30-40% token gen increase from this: ./llama-server --no-mmap --port 8080 --host 0.0.0.0 -kvu -ts 75,70 \ --alias qwen -hf bartowski/deepreinforce-ai_Ornith-1.0-35B-GGUF:Q8_0 -sm layer -c 255000 -cram 0 \ -ctk f16 -ctv f16 -fa 1 --jinja -t 7 --metrics --temp 0.6…

12
LangChain releases dev-tools 2d ago

langchain-openrouter==0.2.5

Changes since langchain-openrouter==0.2.4 release(openrouter): 0.2.5 ( #38553 ) fix(openrouter): deduplicate repeated finish metadata ( #38552 ) fix(openrouter): strip Responses reasoning IDs ( #38383 )

32
r/LocalLLaMA community 2d ago

It’s time, Sam, it’s time.

I mean….. I’m no CEO…. but it seems like this would be the absolute perfect time to drop a super powerful GPT-OSS-2 to throw a big ol’ wet blanket on Anthropic’s IPO. It doesn’t need to be like frontier or anything, just a 20b and a 120b that is as fast as the old versions, add…

31
Ollama releases dev-tools 2d ago

v0.31.0

launch: check for min version for hermes desktop ( #16912 )

4
r/LocalLLaMA community 2d ago

DeepSeek V4, PR merged into llama.cpp !

The PR : https://github.com/ggml-org/llama.cpp/pull/24162 All to git pull, cmake , and download GGUFs ! A vos marques, prêt, partez !   submitted by   /u/Squik67 [link]   [comments]

4
r/LocalLLaMA community 2d ago

Qwen3-tts.cpp + Compose Desktop GUI

I improved my qwen3-tts.cpp implementation to be about 5x realtime on my RTX 5080. It is GGML based, so it should compile and run anywhere - however I only tested it with CPU & CUDA under Windows & Linux: https://github.com/Danmoreng/qwen3-tts.cpp Additionally I made a Desktop…

13
TechCrunch — AI news-outlet 2d ago

Anthropic and Gov. Newsom forge deal allowing California government to use Claude at half price

As Anthropic forges a closer relationship with the state of California, the federal government has made an enemy out of the OpenAI rival.

26
TechCrunch — AI news-outlet 2d ago

Arena, the AI leaderboard everyone uses, is now a $100M business

The startup, which runs a popular free AI leaderboard, launched its commercial service just last September.

23
Hacker News — AI on Front Page community 2d ago

Qwen 3.6 27B is the sweet spot for local development

Article URL: https://quesma.com/blog/qwen-36-is-awesome/ Comments URL: https://news.ycombinator.com/item?id=48721903 Points: 204 # Comments: 133

7
TechCrunch — AI news-outlet 2d ago

Cursor now has a mobile app for guiding your coding agent on the go

Cursor has launched a new mobile app for remote oversight over coding agents.

29
r/MachineLearning community 2d ago

I'm trying to implement CALM paper, and I have some questions. [P]

Hello, I'm trying to implement the Pocket TTS by kyutai-labs represented by this paper . Since they have didn't released the training/fine-tuning code. I'm trying to implement it on my own for learning some stuff. I have read the paper, tried to implement it with much more…

34
Simon Willison community 2d ago

Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding

Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding This is an interesting new open weights (MIT licensed) model, the first model release from DeepReinforce. [...] with variants including 9B Dense, 31B Dense, 35B MoE, and 397B MoE. Built on top of pretrained Gemma 4 and Qwen…

5

TheoremGraph: Bridging Formal and Informal Mathematics

Learning Transferable Dynamics Priors from Action to World Modeling

The Surprising Effectiveness of Video Diffusion Models for Hand Motion Reconstruction

Tesla V100 16GB local LLMs, single and dual NVLink benchmarks

Interleaved Speech Language Models Latently Work In Text

not much happened today

Video-MME-Logical: A Controlled Diagnostic Benchmark for Video Temporal-Logical Reasoning

Anyone using Gemma4:31b over Qwen3.6:27b or 35b(a10)

Trimming the Long-Tail of Visual World Modeling Evaluation

Open but Incompatible: A License Compatibility Analysis of Corpora for Low-Resource African Languages

Fine-Tuning General-Purpose Large Language Models for Agricultural Applications:A Reproducible Framework and Evaluation Protocol Based on Qwen3-8B

Fast Numbers, Slow Language: Bridging Quantitative and Qualitative Earnings Signals

Are We Measuring Strategy or Phrasing? The Gap Between Surface- and Approach-Level Diversity in LLM Math Reasoning

LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing

Geometric Stability of Neural Population Codes: Regional Variation, Behavioral Relevance, and Circuit Dependence

Walking in the Implicit: Interactive World Exploration via Neural Scene Representation

SafePyramid: A Hierarchical Benchmark for In-context Policy Guardrailing

PoseShield: Neural Collision Fields for Human Self-Collision Resolution

Orca: The World is in Your Mind

ReFreeKV: Towards Threshold-Free KV Cache Compression

Monte Carlo Energy Aggregation for Mobile 3D Gaussian Splatting

Nemotron-Labs-Diffusion-Image: Advancing Masked Discrete Diffusion for High-Resolution Image Synthesis

PolicyGuard: A Dialogue-Grounded Sub-Agent Verifier for Policy Adherence in LLM Agents

Vibe coding platform Base44 launches own model as AI startups seek defensibility

GUICrafter: Weakly-Supervised GUI Agent Leveraging Massive Unannotated Screenshots

Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction

MIMFlow: Integrating Masked Image Modeling with Normalizing Flows for End-to-End Image Generation

How I'm using local models from real-world coding

Been running Qwen3.6-27B through a 3-critic harness. The harness matters more than I thought

Run multiple frameworks in one project with Vercel Services

Introducing VCR: Vercel Container Registry

Vercel Sandbox now support Custom Images

Nano Banana 2 Lite (Gemini 3.1 Flash Lite Image) now on AI Gateway

Introducing GeneBench-Pro

Claude Sonnet 5 now available on Vercel AI Gateway

Vercel Private Blob is now generally available

An expanded Vercel Agent: chat, investigations, and approved actions, now in public beta

Introducing LongCat-2.0 - , a large-scale MoE language model with 1.6 trillion total parameters and ~48 billion activated per token. This was the stealth model that was on Openrouter under the name 'owl-alpha'.

Ornith 35B works reasonably well with Qwen3.6 35B DFlash speculative model

langchain-openrouter==0.2.5

It’s time, Sam, it’s time.

v0.31.0

DeepSeek V4, PR merged into llama.cpp !

Qwen3-tts.cpp + Compose Desktop GUI

Anthropic and Gov. Newsom forge deal allowing California government to use Claude at half price

Arena, the AI leaderboard everyone uses, is now a $100M business

Qwen 3.6 27B is the sweet spot for local development

Cursor now has a mobile app for guiding your coding agent on the go

I'm trying to implement CALM paper, and I have some questions. [P]

Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding