Except by all accounts they succeeded. I believe they have the fastest matmul on nvidia chips in the industry

timmg 2 days ago | parent | next [-]

I was under the impression that their uptake it slow or non-existant. Am I wrong on that?

ozgrakkurt 2 days ago | parent | prev | next [-]

Is it really faster than cublas?

	▲	melodyogonna 2 days ago \| parent [-]
		In some things yes. They're mostly identical in performance though

saagarjha 2 days ago | parent | prev | next [-]

CUTLASS would like to have a word with you.

fooblaster 2 days ago | parent | prev [-]

evidence?

Modular/Mojo is faster than NVIDIA's libraries on their own chips, and open source instead of binary blob. See the 4 part series that culimates in https://www.modular.com/blog/matrix-multiplication-on-blackw... for Blackwell for example.

	▲	fooblaster 2 days ago \| parent [-]
		thanks