Remix.run Logo
dvt 5 hours ago

> I think what they're getting at is that for a given unit of compute, this method achieves 125% performance.

This is not what they're getting at; I explained exactly what they're getting at. I mean, your equivalence of "loss" (what authors actually measured) and "performance" is just bizarre. We use benchmarks to measure performance, and the numbers there were like 1-5% better (apart from the GPQA-Diamond outlier).

Do people even read these papers?