Remix.run Logo
SamFold 4 days ago

There’s a difference between raw numbers on paper and actual real world differences when training frontier models.

There’s a reason no frontier lab using AMD models for training, because the raw benchmarks for performance for a single chip for a single operation type don’t translate to performance during an actual full training run.

FuriouslyAdrift 4 days ago | parent [-]

Meta, in particular, is heavily using AMDs for inference training.

Also, anyone doing very large models tend to prefer AMDs because they have 288GB per chip and outperform for very large models.

Outside of these use cases, it’s a toss up.

AMD is also much more aligned with the supercomputing (HPC) world were they are dominant (AMD cpus and GPUs power around 140 of the top 500 HPC systems and 8 of the top 10 most energy efficient)