Remix.run Logo
imtringued 2 days ago

Assuming a parallel programming language and a SMT aware compiler, the CPU could just switch to another block of static instructions while it is waiting.

namibj 2 days ago | parent | next [-]

You mean like e.g. Nvidia Maxwell?

(There's decent 3rd party documentation from nervana systems from when they squeezed all they could out of f32 dense matrix multiply, at the time substantially faster than Nvidia's cuBLAS library; this is very not exclusive to that architecture, though.)

tliltocatl 2 days ago | parent | prev [-]

> Assuming a parallel programming language

Assuming a parallelizable workload, which is often not the case.