Remix.run Logo
thesz 6 hours ago

5 days ago: https://news.ycombinator.com/item?id=45926371

Sparse models have same quality of results but have less coefficients to process, in case described in the link above sixteen (16) times as less.

This means that these models need 8 times less data to store, can be 16 and more times faster and use 16+ times less energy.

TPUs are not all that good in the case of sparse matrices. They can be used to train dense versions, but inference efficiency with sparse matrices may be not all that great.

HarHarVeryFunny 6 hours ago | parent [-]

TPUs do include dedicated hardware, SparseCores, for sparse operations.

https://docs.cloud.google.com/tpu/docs/system-architecture-t...

https://openxla.org/xla/sparsecore