Thank you for the kind words! Are you saying that AI model innovation stopped at GPT-2 and everyone has performance and gpu utilization figured out?

Are you talking about NVIDIA Hopper or any of the rest of the accelerators people care about these days? :). We're talking about a lot more performance and TCO at stake than traditional CPU compilers.

▲

lqstuart 2 days ago | parent [-]

I’m saying actual algorithmic (as in not data) model innovation has never been a significant part of the revenue generation in the field. You get your random forest, or ResNet, or BERT, or MaskRCNN, or GPT-2-with-One-Weird-Trick, and then you spend four hours trying to figure out how to preprocess your data.

On the flipside, far from figuring out GPU efficiency, most people with huge jobs are network bottlenecked. And that’s where the problem arises: solutions for collective comms optimization tend to explode in complexity because, among other reasons, you now have to package entire orchestrators in your library somehow, which may fight with the orchestrators that actually launch the job.

Doing my best to keep it concise, but Hopper is like a good case study. I want to use Megatron! Suddenly you need FP8, which means the CXX11 ABI, which means recompiling Torch along with all those nifty toys like flash attention, flashinfer, vllm, whatever. Ray, jsonschema, Kafka and a dozen other things also need to match the same glibc and glibc++ versions. So using that as an example, suddenly my company needs C++ CICD pipelines, dependency management etc when we didn’t before. And I just spent three commas on these GPUs. And most likely, I haven’t made a dime on my LLMs, or autonomous vehicles, or weird cyborg slavebots.

So what all that boils down to is just that there’s a ton of inertia against moving to something new and better. And in this field in particular, it’s a very ugly, half-assed, messy inertia. It’s one thing to replace well-designed, well-maintained Java infra with Golang or something, but it’s quite another to try to replace some pile of shit deep learning library that your customers had to build a pile of shit on top of just to make it work, and all the while fifty college kids are working 16 hours a day to add even more in the next dev release, which will of course be wholly backwards and forwards incompatible.

But I really hope I’m wrong :)

▲

growthwtf a day ago | parent [-]

Lattner's comment aside (which I'm fanboying a little bit at), I do tend to agree with your pessimism/realism for what it's worth. It's gonna be a long long time before that whole mess you're describing is sorted out, but I'm confident that over the next decade we will do it. There's just too much money to be made by fixing it at this point.

I don't think it's gonna happen instantly, but it will happen, and Mojo/Modular are really the only language platform I see taking a coherent approach to it right now.

	▲	lqstuart 8 hours ago \| parent [-]
		I tend to agree with you, but I hoped the field would start collectively figuring out how to be big boys with CICD and dependency management back in 2017–I thought Google’s awkward source release of BERT was going to be the low point, and we’d switch to Torch and be saved. Instead, it’s gotten so much worse. And the kind of work that the Python core team has been putting into package and dependency management is nothing short of heroic, and it still falls short because PyTorch extends the Python runtime itself, and now Torch compile intercepting Py_FrameEval and NVIDIA is releasing Python CUDA bindings. It’s just such a massive, uphill, ugly moving target to try to run down. And I sit here thinking the same as many of these comments—on the one hand, I can’t imagine we’re still using Python 3 in 2035? 2050?? But on the other hand I can’t envision a path leading out of the mess making money, or at least continue pretending they’ll start to soon.