News / #inference Tag Inference 359 articles archived under #inference · RSS Sign in to follow NVIDIA Developer Blog official-blog 4mo ago Unlock Massive Token Throughput with GPU Fractioning in NVIDIA Run:ai As AI workloads scale, achieving high throughput, efficient resource usage, and predictable latency becomes essential. NVIDIA Run:ai addresses these challenges... 30 Smol AI News news-outlet 4mo ago Z.ai GLM-5: New SOTA Open Weights LLM **Zhipu AI** launched **GLM-5**, an **Opus-class** model scaling from **355B to 744B parameters** with **DeepSeek Sparse Attention** integration for cost-efficient long-context serving. GLM-5 achieves **SOTA on BrowseComp** and leads on **Vending Bench 2**, focusing on office… 18 NVIDIA Developer Blog official-blog 4mo ago Automating Inference Optimizations with NVIDIA TensorRT LLM AutoDeploy NVIDIA TensorRT LLM enables developers to build high-performance inference engines for large language models (LLMs), but deploying a new architecture... 31 Smol AI News news-outlet 4mo ago ElevenLabs $500m Series D at $11B, Cerebras $1B Series H at $23B, Vibe Coding -> Agentic Engineering **Google's Gemini 3** is being integrated widely, including a new **Chrome side panel** and **Nano Banana** UX features, with rapid adoption and a **78% unit-cost reduction** in serving costs. The **Gemini app** reached **750M+ MAU** in Q4 2025, nearing ChatGPT's user base.… 23 Smol AI News news-outlet 4mo ago Context Graphs: Hype or actually Trillion-dollar opportunity? **Zhipu AI** launched **GLM-OCR**, a lightweight **0.9B** multimodal OCR model excelling in complex document understanding with top benchmark scores and day-0 deployment support from **lmsys**, **vllm**, and **novita labs**. **Ollama** enabled local-first usage with easy offline… 28 Smol AI News news-outlet 5mo ago Open Responses: explicit spec for OpenAI's Responses API supported by OpenRouter, Ollama, Huggingface, vLLM, et al **OpenAI** launched the **Open Responses** API spec, an open-source, multi-provider standard for interoperable LLM APIs designed to simplify agent stacks and tooling. Early adopters like **ollama** and **vLLM** support the spec, while notable absences include **anthropic** and… 4 Smol AI News news-outlet 6mo ago Meta Superintelligence Labs acquires Manus AI for over $2B, at $100M ARR, 9months after launch **Manus** achieved a rapid growth trajectory in 2025, raising **$500M** from Benchmark and reaching **$100M ARR** before being acquired by **Meta** for an estimated **$4B**. The **vLLM** team launched a dedicated community site with new resources, while performance issues with… 30 Smol AI News news-outlet 6mo ago not much happened today **GLM-4.7** and **MiniMax M2.1** open-weight model releases highlight day-0 ecosystem support, coding throughput, and agent workflows, with GLM-4.7 achieving a +9.5% improvement over GLM-4.6 and MiniMax M2.1 positioned as an OSS Claude-like MoE model with 230B total parameters… 18 Eugene Yan research 40mo ago How to Write Data Labeling/Annotation Guidelines Writing good instructions to achieve high precision and throughput. 5 Page 8 of 8 · 359 articles ← Newer