| ▲ | dist-epoch 2 hours ago | |
> but which requires all parallelism to be statically declared ahead of time this is what all specialized chips like TPU/Cerebras require today, and it allows for better optimization than a generic CPU since you can "waste" 30 min figuring out the perfect routing/sequencing of operations, instead of doing it in the CPU in nanoseconds/cycles another benefit is you can throw away all the CPU out-of-order/branch prediction logic and put useful matrix multipliers in it's place | ||