r/MachineLearning · · 2 min read

Spot/interruptible H100 and A100 pricing across RunPod, Vast.ai, and AWS - June 2026 data [D]

Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.

Following up on the on-demand comparison from a couple weeks back - pulled spot/ interruptible pricing this time since that's where the real savings conversation actually lives for anyone running checkpointed training or batch jobs.

Checked: June 2026. Spot/interruptible tier, single GPU.

~ H100 80GB -Spot/Interruptible

RunPod :- $1.80-$2.40/hr , Community spot, can terminate without notice. Vast.ai :- $1.47-$2.00/hr (low end seen as low as $1.03/hr in thinner markets) wide range, host-dependent. AWS (P5, spot) :- technically available, $2.50-$3.10/hr extremely limited, frequently unavailable at any price.

~ A100 80GB - Spot/Interruptible

RunPod :- community spot as low as $0.20-0.40/hr (high variance) , reliability drops fast at this end. Vast.ai:- $0.67/hr typical, lower with thinner-reliability hosts, marketplace bidding, varies by host score. AWS (P4d, spot) :- ~$1.00-1.50/hr more consistently available than P5 spot.

What stood out:

  • The spot discount vs on-demand is real - 40-60% off on H100, sometimes more on A100 - but the spread between providers on spot is much wider than on-demand. You're not comparing apples to apples, you're comparing apples to "whatever fell off the truck this hour."
  • AWS spot for H100 (P5) is more of a theoretical price point than a practical one right now - availability is thin enough that "checked the price" and "could actually get one" are two different questions.
  • Vast.ai's floor prices look incredible until you check host reliability scores. The $0.67/hr A100 and the $1.50/hr A100 are not the same product even though they're listed the same way.

This tier only makes sense if your job checkpoints well - anything customer-facing or latency-sensitive, spot isn't worth the risk regardless of price.

Not selling anything, just tracking this for my own training runs and figured others here are doing the same math. Anyone actually running production batch jobs on spot right now? Curious what interruption rates you're actually seeing vs what's advertised.

submitted by /u/Shot-Calligrapher166
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/MachineLearning