▲ | gajjanag 5 days ago | |
As others have pointed out, these phenomena are well known to many folks across companies in the AI infra space. It doesn't really break new ground. This article is a good exposition of the basic strategies though. What I would have loved is a discussion around collectives/multi-node setups. And showing how to get determinism at low performance penalty for multi-node reduction collectives. |