Remix.run Logo
namibj 2 days ago

You mean like e.g. Nvidia Maxwell?

(There's decent 3rd party documentation from nervana systems from when they squeezed all they could out of f32 dense matrix multiply, at the time substantially faster than Nvidia's cuBLAS library; this is very not exclusive to that architecture, though.)