Tag

Hardware

281 articles archived under #hardware · RSS

Dwarkesh Podcast news-outlet 1mo ago

Reiner Pope – Chip design from the bottom up

Working up from basic logic gates to why GPUs, TPUs, FPGAs, and the human brain each look the way they do.

22
arXiv — Machine Learning research 1mo ago

TONIC: Token-Centric Semantic Communication for Task-Oriented Wireless Systems

arXiv:2605.21553v1 Announce Type: new Abstract: Tokens are becoming the basic units through which foundation models represent and process information for understanding and inference. However, traditional wireless communication, centered on bit-level fidelity, faces a mismatch…

33
r/LocalLLaMA community 1mo ago

When your LLM treats data center GPUs like an optional DLC

  submitted by   /u/noprompt [link]   [comments]

10
Hugging Face Daily Papers research 1mo ago

Capturing LLM Capabilities via Evidence-Calibrated Query Clustering

Abstract Query clustering algorithm ECC improves LLM capability evaluation by aligning semantic embeddings with latent capability demands through posterior model comparisons and Bradley-Terry modeling. AI-generated summary Query clustering organizes queries into groups that…

13
Ars Technica — AI news-outlet 1mo ago

As Grok flounders, SpaceX bets future on beating Big Tech at AI

SpaceX IPO filing pitches orbital data centers as Grok lags rival AI services.

26
r/LocalLLaMA community 1mo ago

Qwen3.6 35Ba3 has changed my workflows and even how I use my computer

My workflow has changed basically to ask Codex to do certain tasks and then document how to do them (including errors it found on its way) into a skill. I feed that skill to pi, and suddenly my qwen3.6 gets that hard stuff done: - devops on a VPS - using docling to create epubs…

33
Google DeepMind official-blog 1mo ago

We’re launching the Google DeepMind Accelerator program in Asia Pacific to tackle environmental risks

The Asia-Pacific region is a global engine for economic growth, but it's also highly vulnerable to climate change. While green technologies are gaining momentum, a recent report shows they aren’t scaling fast enough to keep up with the region’s rising environmental risks. To…

22
r/MachineLearning community 1mo ago

Can liveness detection models generalise to synthetic media generation techniques they were never trained on? [D]

Most liveness detection systems in production today were built around a threat model where the attacker is submitting a static image or a basic replay video. The generation quality of current synthetic media is categorically different from what those training datasets captured.…

32
NVIDIA Developer Blog official-blog 1mo ago

Get Real-Time Visibility into GPU Usage Across Kubernetes Clusters

Maximizing the value of AI infrastructure demands deep visibility into GPU utilization. Yet many platform teams running AI workloads on Kubernetes operate with...

25
r/MachineLearning community 1mo ago

I created an LLM post-training method called RPS. Preliminary results show that it improved Qwen3-8b's program synthesis reliability. [R]

RPS is inspired by neuroscience. As humans, we learn basic skills as kids with high neuro-plasticity. We then learn advanced skills as teens and adults with low neuro-plasticity. RPS trains a model in 2 stages. In stage 1, the model is trained on easy data with high learning…

26
Hugging Face Daily Papers research 1mo ago

CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing

Abstract Current GUI agents show limited effectiveness in professional media post-production tasks despite advances in spatial grounding and multimodal alignment. AI-generated summary While GUI agents have made significant progress in web navigation and basic operating system…

13
arXiv — Machine Learning research 1mo ago

Unsupervised clustering and classification of upper limb EMG signals during functional movements: a data-driven

arXiv:2605.20599v1 Announce Type: new Abstract: This study presents a comprehensive approach for the clustering and classification of upper-limb surface electromyography (sEMG) signals during functional reach and grasp movements. The methodology was applied to the NINAPRO DB4…

18
arXiv — NLP / Computation & Language research 1mo ago

Post-Hoc Understanding of Metaphor Processing in Decoder-Only Language Models via Conditional Scale Entropy

arXiv:2605.21391v1 Announce Type: new Abstract: Metaphor requires a language model to resolve a token whose contextual meaning diverges from its basic literal sense. Understanding how transformer models organize this reinterpretation across depth remains an open problem in…

19
The Information — AI news-outlet 1mo ago

Anthropic and SpaceX Detail Compute Deal Worth Up to $40 Billion

Anthropic could pay SpaceX up to $40 billion over the next several years to use compute from data centers, but either company has the power to call off the deal early, SpaceX revealed when it filed for an initial public offering on Wednesday. SpaceX is receiving $1.25 billion…

16
Latent.Space news-outlet 1mo ago

Railway: The Agent-Native Cloud — Jake Cooper

3M Users, 100K Signups/Week, Own-Metal Data Centers, $200K+ Coding Agent Spend, and the Death of PRs

21
TechCrunch — AI news-outlet 1mo ago

Musk’s xAI is being sued over its data center generators. Now, it’s buying $2.8B more.

Elon Muks's xAI said it will buy $2.8 billion worth of natural gas turbines over the next three years, according to SpaceX's IPO filing.

6
r/LocalLLaMA community 1mo ago

24GB M4 Mac - is Qwen 9B only option while system is running?

I have mac at work that I want to use local model for prototyping and basic prompts that needs to stay on device. What sort of model I can run that I can fit at least 64k context ? Any setups share or guides welcome. I need to have firefox open with one tab at minium. Problem I…

6
The Information — AI news-outlet 1mo ago

Sam Altman Offers YC Founders $2 Million in OpenAI Tokens For Equity

OpenAI cofounder and CEO Sam Altman late Tuesday offered to invest $2 million in every startup currently in the Y Combinator startup accelerator program—not in cash, but in OpenAI tokens. “I am excited to see what will happen with tokenmaxxing startups, both for how they work…

13
arXiv — Machine Learning research 1mo ago

DynaTrain: Fast Online Parallelism Switching for Elastic LLM Training

arXiv:2605.18815v1 Announce Type: new Abstract: Modern large language model (LLM) training is inherently dynamic: resource fluctuations, RLHF phase shifts, and cluster elasticity continually reshape the optimal parallelism layout, posing a significant challenge to existing…

22
arXiv — Machine Learning research 1mo ago

A Multi-Dimensional Clustering Approach for Identifying Inborn Errors of Immunity

arXiv:2605.18880v1 Announce Type: new Abstract: Rare diseases such as inborn errors of immunity (IEI) require early diagnosis to prevent end organ damage and improve quality of life. Hurdles in accessing and curating large scale electronic health record (EHR) data limit routine…

10
arXiv — NLP / Computation & Language research 1mo ago

Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering

arXiv:2605.19220v1 Announce Type: new Abstract: Uncertainty Quantification (UQ) is widely regarded as the primary safeguard for deploying Large Language Models (LLMs) in high-stakes domains. However, we argue that the field suffers from a category error: mainstream UQ methods…

22
arXiv — NLP / Computation & Language research 1mo ago

ClusterRAG: Cluster-Based Collaborative Filtering for Personalized Retrieval-Augmented Generation

arXiv:2605.18769v1 Announce Type: cross Abstract: Personalized Retrieval-Augmented Generation (RAG) relies on accurately selecting user-relevant documents. In practice, existing RAG approaches often suffer from high retrieval costs and overlook that collaborative signals from…

36
r/LocalLLaMA community 1mo ago

Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s!

Hey r/DeepSeek , Who says we need an H100 cluster or the latest expensive GPUs to run frontier MoE models? I wanted to see how far we could push a single node of consumer legacy hardware, so we spent less than $2,500 total to build a budget machine that successfully runs…

29
r/LocalLLaMA community 1mo ago

Intel's Crescent Island PCB Leaks, Showing a Massive Xe3P GPU, 16-Pin Connector, 160GB LPDDR5X as Intel Sidesteps the HBM Shortage

Upcoming Intel Xe3P data center GPU with 20 8GBLPDDR5X modules for a total of 160GB, bypassing HBM shortages. Assuming a 32-bit interface, that's a 640-bit wide memory interface, or 10 channel memory interface if converted to the 64-bit wide desktop equivalent. At 8800-9500MT,…

35
Ars Technica — AI news-outlet 1mo ago

Electrical utility megamerger is all about the data centers

NextEra’s blockbuster deal with Dominion likely means higher bills for consumers.

29
arXiv — Machine Learning research 1mo ago

AdaGraph: A Graph-Native Clustering Algorithm That Overcomes the Curse of Dimensionality and Enables Scientific Discovery

arXiv:2605.16320v1 Announce Type: new Abstract: We present AdaGraph, a graph-native clustering algorithm born from the Structure-Centric Machine Learning (SC-ML) paradigm -- a new field of unsupervised learning that replaces geometry-centric (distance-based) computation with…

24
arXiv — Machine Learning research 1mo ago

HPC-LLM: Practical Domain Adaptation and Retrieval-Augmented Generation for HPC Support

arXiv:2605.16347v1 Announce Type: new Abstract: Modern scientific research increasingly depends on High-Performance Computing (HPC) infrastructures, yet many researchers face significant operational barriers when interacting with cluster environments, job schedulers, GPU…

12
The Information — AI news-outlet 1mo ago

The ‘Price is Right’ for GPUs: The Startup Turning Nvidia Chips Into ‘Boring’ Bankable Assets

Good morning, Anissa here! The AI boom requires an enormous amount of new power plants, chips and data centers. But it also requires something that’s less visible: new financial plumbing that makes it possible to lend against this costly hardware. Miami fintech startup Barkr is…

38
The Information — AI news-outlet 1mo ago

Edge Inference Chip Startup SiMa.ai Raising at $1.4 Billion Valuation

Nvidia might be on a tear, but some investors are still convinced that there’s demand for another kind of specialized chips. And they’re putting their money where their mouth is. For example: San Jose, Calif.-based SiMa.ai , which develops chips that work on devices such as…

14
Stratechery (Ben Thompson) community 1mo ago

Data Center Discontent, Understanding the Opposition, Fixing the Problem

There are understandable reasons for people to oppose data centers; the only solution that will work is simply paying them off.

4
arXiv — NLP / Computation & Language research 1mo ago

Automatic Construction of a Legal Citation Graph from 100 Million Ukrainian Court Decisions: Large-Scale Extraction, Topological Analysis, and Ontology-Driven Clustering

arXiv:2605.15362v1 Announce Type: new Abstract: Half a billion citation edges extracted from 100.7 million Ukrainian court decisions reveal that judicial citation structure encodes legal domain boundaries without supervision and predicts future legislative importance with…

19
r/LocalLLaMA community 1mo ago

Qwen 3.6 27B Q8 on four Nvidia RTX A4000 (16GB each) with Llama.cpp and MTP enabled

Qwen 3.6 27B Q8 on four Nvidia RTX A4000 (16GB each) with Llama.cpp and MTP enabled My setup is heterogenous, I originally acquired my server (Lenovo ThinkStation P3 Tower Gen 2) to run OpenShift/K8s clusters (because I work on that), and later on I started purchasing one by one…

14
r/LocalLLaMA community 1mo ago

I trained TIME: short context-triggered thinking on Qwen model instead of overthinking

Started this as a personal project for my Open-WebUI setup to use. Somehow it ended up as an ACL 2026 paper. Not some lab paper, it is personal solo independent paper that happened. TIME is basically my attempt to train Qwen3 models to think in short bursts wherever the response…

29
r/LocalLLaMA community 1mo ago

Benchmarking vLLM vs SGLang vs llama.cpp on a mixed Blackwell/Ada cluster

I have been running some benchmarks on a heterogeneous 7-GPU cluster to see how different inference engines handle long context prefill using pipeline parallelism. My setup consists of a mix of Blackwell and Ada cards: one RTX PRO 6000 96GB, one PRO 5000 48GB, two 5090 32GB, and…

4
r/LocalLLaMA community 1mo ago

The power of structured workflows and small local models

A month ago, I experimented with a very basic home-rolled agent loop with a handful of tools and found it worked surprisingly well in spite of how crude it was: https://www.reddit.com/r/LocalLLaMA/comments/1sl7f8e/homerolled_loop_agent_is_surprisingly_effective/ Later, I wrote…

15
Dwarkesh Podcast news-outlet 1mo ago

Notes on pretraining parallelisms and failed training runs.

Deeply researched interviews

37
Hacker News — AI on Front Page community 1mo ago

Bun Rust rewrite: "codebase fails basic miri checks, allows for UB in safe rust"

Article URL: https://github.com/oven-sh/bun/issues/30719 Comments URL: https://news.ycombinator.com/item?id=48150900 Points: 246 # Comments: 154

22
The Algorithmic Bridge news-outlet 1mo ago

Weekly Top Picks #121

Hottest AI job: FDEs / Trump in China / No jobpocalypse / AI models fixing benchmarks / Americans agree: "no datacenters here" / Claude 3 vs Claude 4

14
Ars Technica — AI news-outlet 1mo ago

Pennsylvanians use town hall meeting to rail against data center boom

“This is a public trust and transparency issue.”

22
Hacker News — AI on Front Page community 1mo ago

Prolog Basics Explained with Pokémon

Article URL: https://unplannedobsolescence.com/blog/prolog-basics-pokemon/ Comments URL: https://news.ycombinator.com/item?id=48147091 Points: 215 # Comments: 34

34
r/LocalLLaMA community 1mo ago

Important (vision) Qwen3.5 template fix dropped in vllm

Sharing this because I personally had some annoying issues and I can confirm this un-fucked them. Basically once you posted an image in the conversation the model went haywire. Not too badly but annoying   submitted by   /u/Dany0 [link]   [comments]

14
arXiv — Machine Learning research 1mo ago

Towards Resource-Efficient LLMs: End-to-End Energy Accounting of Distillation Pipelines

arXiv:2605.13981v1 Announce Type: new Abstract: The rise in deployment of large language models has driven a surge in GPU demand and datacenter scaling, raising concerns about electricity use, grid stress, and the impacts of modern AI workloads. Distillation is often promoted as…

19
arXiv — Machine Learning research 1mo ago

EnergyLens: Predictive Energy-Aware Exploration for Multi-GPU LLM Inference Optimization

arXiv:2605.14249v1 Announce Type: new Abstract: We present EnergyLens, an end-to-end framework for energy-aware large language model (LLM) inference optimization. As LLMs scale, predicting and reducing their energy footprint has become critical for sustainability and datacenter…

12
arXiv — Machine Learning research 1mo ago

LoMETab: Beyond Rank-1 Ensembles for Tabular Deep Learning

arXiv:2605.14365v1 Announce Type: new Abstract: Recent tabular learning benchmarks increasingly show a tight performance cluster rather than a clear hierarchy among leading methods, spanning gradient boosted decision trees, attention-based architectures, and implicit ensembles…

4
Hugging Face Daily Papers research 1mo ago

LLM-based Detection of Manipulative Political Narratives

Abstract A computational framework combining prompt-based filtering and unsupervised clustering identifies manipulative political narrative clusters from social media posts without requiring predefined categories. AI-generated summary We present a new computational framework for…

9
The Information — AI news-outlet 1mo ago

Newmark Data Center Advisor Brent Mayo Departs for DigitalBridge

Brent Mayo, head of data center capital markets at advisory firm Newmark, has left the firm and told people he is joining investment firm DigitalBridge, according to two people with knowledge of the move. At Newmark, which specialized in commercial real estate, Mayo was a part…

26
Hugging Face Daily Papers research 1mo ago

Federation of Experts: Communication Efficient Distributed Inference for Large Language Models

Abstract Federation of Experts restructures mixture of experts blocks into clusters that process KV heads independently, eliminating inter-node communication bottlenecks while maintaining generation quality. AI-generated summary Mixture of experts has emerged as the primary…

23
Ars Technica — AI news-outlet 1mo ago

Energy supplier abandons Lake Tahoe residents to serve data centers

Town’s 49,000 California residents compete with Nevada data centers for energy.

19
r/MachineLearning community 1mo ago

OpenAI's deployment company move says more about the AI gap than any benchmark[D]

OpenAI launched a deployment company with $4B initial investment, 19 partner organizations, and acquired Tomoro (UK-based AI consultancy, ~150 engineers). The pitch: embed "Forward Deployed Engineers" into enterprises to help them actually use AI. This is basically the Palantir…

35
arXiv — Machine Learning research 1mo ago

scShapeBench: Discovering geometry from high dimensional scRNAseq data

arXiv:2605.12662v1 Announce Type: new Abstract: High-dimensional point cloud data arise across many scientific domains, especially single-cell biology. The shapes or topologies of these datasets determine the types of information that can be extracted. For example, clustered…

32

Reiner Pope – Chip design from the bottom up

TONIC: Token-Centric Semantic Communication for Task-Oriented Wireless Systems

When your LLM treats data center GPUs like an optional DLC

Capturing LLM Capabilities via Evidence-Calibrated Query Clustering

As Grok flounders, SpaceX bets future on beating Big Tech at AI

Qwen3.6 35Ba3 has changed my workflows and even how I use my computer

We’re launching the Google DeepMind Accelerator program in Asia Pacific to tackle environmental risks

Can liveness detection models generalise to synthetic media generation techniques they were never trained on? [D]

Get Real-Time Visibility into GPU Usage Across Kubernetes Clusters

I created an LLM post-training method called RPS. Preliminary results show that it improved Qwen3-8b's program synthesis reliability. [R]

CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing

Unsupervised clustering and classification of upper limb EMG signals during functional movements: a data-driven

Post-Hoc Understanding of Metaphor Processing in Decoder-Only Language Models via Conditional Scale Entropy

Anthropic and SpaceX Detail Compute Deal Worth Up to $40 Billion

Railway: The Agent-Native Cloud — Jake Cooper

Musk’s xAI is being sued over its data center generators. Now, it’s buying $2.8B more.

24GB M4 Mac - is Qwen 9B only option while system is running?

Sam Altman Offers YC Founders $2 Million in OpenAI Tokens For Equity

DynaTrain: Fast Online Parallelism Switching for Elastic LLM Training

A Multi-Dimensional Clustering Approach for Identifying Inborn Errors of Immunity

Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering

ClusterRAG: Cluster-Based Collaborative Filtering for Personalized Retrieval-Augmented Generation

Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s!

Intel's Crescent Island PCB Leaks, Showing a Massive Xe3P GPU, 16-Pin Connector, 160GB LPDDR5X as Intel Sidesteps the HBM Shortage

Electrical utility megamerger is all about the data centers

AdaGraph: A Graph-Native Clustering Algorithm That Overcomes the Curse of Dimensionality and Enables Scientific Discovery

HPC-LLM: Practical Domain Adaptation and Retrieval-Augmented Generation for HPC Support

The ‘Price is Right’ for GPUs: The Startup Turning Nvidia Chips Into ‘Boring’ Bankable Assets

Edge Inference Chip Startup SiMa.ai Raising at $1.4 Billion Valuation

Data Center Discontent, Understanding the Opposition, Fixing the Problem

Automatic Construction of a Legal Citation Graph from 100 Million Ukrainian Court Decisions: Large-Scale Extraction, Topological Analysis, and Ontology-Driven Clustering

Qwen 3.6 27B Q8 on four Nvidia RTX A4000 (16GB each) with Llama.cpp and MTP enabled

I trained TIME: short context-triggered thinking on Qwen model instead of overthinking

Benchmarking vLLM vs SGLang vs llama.cpp on a mixed Blackwell/Ada cluster

The power of structured workflows and small local models

Notes on pretraining parallelisms and failed training runs.

Bun Rust rewrite: "codebase fails basic miri checks, allows for UB in safe rust"

Weekly Top Picks #121

Pennsylvanians use town hall meeting to rail against data center boom

Prolog Basics Explained with Pokémon

Important (vision) Qwen3.5 template fix dropped in vllm

Towards Resource-Efficient LLMs: End-to-End Energy Accounting of Distillation Pipelines

EnergyLens: Predictive Energy-Aware Exploration for Multi-GPU LLM Inference Optimization

LoMETab: Beyond Rank-1 Ensembles for Tabular Deep Learning

LLM-based Detection of Manipulative Political Narratives

Newmark Data Center Advisor Brent Mayo Departs for DigitalBridge

Federation of Experts: Communication Efficient Distributed Inference for Large Language Models

Energy supplier abandons Lake Tahoe residents to serve data centers

OpenAI's deployment company move says more about the AI gap than any benchmark[D]

scShapeBench: Discovering geometry from high dimensional scRNAseq data