By NVIDIA's own numbers and widely available testing numbers for FP8, the AMD MI355X just edges out the NVIDIA B300 (both the top performers) at 10.1 PFLOPs per chip at around 1400 W per chip. Neither of these thngs are available as a discrete device... you're going to be buying a system, but typically AMD Instinct systems run about 15% less than the comparable NVIDIA ones.

NIVIDIA is a very pricey date.

https://wccftech.com/mlperf-v5-1-ai-inference-benchmark-show...

https://semianalysis.com/2024/04/10/nvidia-blackwell-perf-tc...

https://semianalysis.com/2025/06/13/amd-advancing-ai-mi350x-...

▲

SamFold 4 days ago | parent [-]

There’s a difference between raw numbers on paper and actual real world differences when training frontier models.

There’s a reason no frontier lab using AMD models for training, because the raw benchmarks for performance for a single chip for a single operation type don’t translate to performance during an actual full training run.

	▲	FuriouslyAdrift 4 days ago \| parent [-]
		Meta, in particular, is heavily using AMDs for inference training. Also, anyone doing very large models tend to prefer AMDs because they have 288GB per chip and outperform for very large models. Outside of these use cases, it’s a toss up. AMD is also much more aligned with the supercomputing (HPC) world were they are dominant (AMD cpus and GPUs power around 140 of the top 500 HPC systems and 8 of the top 10 most energy efficient)