| ▲ | kingstnap 6 hours ago | |
You underestimate the amount of inference and very much overestimate what training is. Training is more or less the same as doing inference on an input token twice (forward and backward pass). But because its offline and predictable it can be done fully batched with very high utilization (efficiently). Training is guestimate maybe 100 trillion total tokens but these guys apparently do inference on the quadrillion token monthly scales. | ||