Remix.run Logo
ismailmaj 3 days ago

Typically for matrix multiplications there is a wide range of algorithms you could use to compute it, on one extreme end you could use numerically stable summation and the other extreme you could have tiled matmul with FP8, the industry trend seems to go further away from numerical stable algorithms without much quality drop it seems. My claim is probably unfair since it ignores the scale you gain from the speed/precision tradeoff, so I assumed numerical stability is not that beneficial compared to something precision heavy like physics simulation in HPC.

Nevermark 3 days ago | parent [-]

> I assumed numerical stability is not that beneficial compared to something precision heavy like physics simulation in HPC.

Yes, exactly.

For physics, there is a correct result. I.e. you want your simulation to reflect reality with high accuracy, over a long chain of calculations. Extremely tight constraint.

For deep learning, you don't have any specific constraints on parameters, except that you want to end up with a combination that fits the data well. There are innumerable combinations of parameter values that will do that, you just need to find one good enough combination.

Wildly different.