Remix.run Logo
kregasaurusrex 3 days ago

On Friday I was converting a constrained solver from python to another language, and ran into some difficulty with subsituting an optimzer that's a few lines of easily written Scipy; but barely being supported in another language. One AI tool found this out and fully re-implemented the solver using a custom linear algebra library it wrote from scratch. But another AI tool was really struggling with getting the right syntax to be compatible with the common existing optimization libaries, and I felt like I was repeatedly putting queries (read: $) into the software equivalent of a slot machine that was constantly apologizing for not giving a testable answer while eating tens of dollars in direct costs waiting for the "jackpot" of working code.

The feedback loop of "maybe the next time it'll be right" turned into a few hundred queries resulting in finding the LLM's attempts were a ~20 node cycle of things it tried and didn't work, and now you're out a couple dollars and hours of engineering time.

moregrist 3 days ago | parent | next [-]

> One AI tool found this out and fully re-implemented the solver using a custom linear algebra library it wrote from scratch.

So slow, untested, and likely buggy, especially as the inputs become less well-conditioned?

If this was a jr dev writing code I’d ask why they didn’t use <insert language-relevant LAPACK equivalent>.

Neither llm outcome seems very ideal to me, tbh.

theshrike79 3 days ago | parent [-]

With mathematical things you can always write comprehensive and complete unit tests to check the AIs work.

TDD (and exhaustive unit tests in general) are a good idea with LLMs anyway. Just either tell it not to touch test, or in Claude's case you can use Hooks to _actually_ prevent it from editing any test file.

Then shove it at the problem and it'll iterate a solution until the tests pass. It's like the Excel formula solver, but for code :D

moregrist 3 days ago | parent | next [-]

You could, and hope that you understand the problem domain and numerical analysis enough to hit all the hard cases. And then you’d have expanded your codebase with lots of tests that are relevant to a linear algebra library and not to what you’re trying to do.

Or you could use existing linear algebra libraries which are highly optimized, highly tested, and have a well-understood api that’s easier to review.

And then get back to the legit hard stuff, like maybe worrying if your linear solver needs preconditioning and how to best to that. Or any of the many numerical problems people tend to face when doing this kind of work.

I’m not sure why you’d give the llm a pass on reinventing the wheel here when you definitely wouldn’t with any other dev.

th0ma5 3 days ago | parent | prev [-]

I think we all understand this we just don't think it works.

quatonion 3 days ago | parent [-]

I'm curious why you think it doesn't work, when there are plenty of people saying it does.

There are limitations at the moment, and I don't see many people disputing that, but it must be doing something right, and its abilities are improving every day. It's learning.

Sometimes I get the feeling a lot of antis painted themselves into a corner early on, and will die on this hill despite constant improvements in the technology.

I have seen similar things many times in my career. There was a time when everyone were very skeptical of high level languages, writing everything in assembler come hell or high water, for example.

At some point it is going to single shot an entire OS or refactor a multi-million line codebase. Will that be enough to convince you?

From my perspective I like to be prepared, so I'm doing what I have always done.. understand and gain experience with these new tools. I much prefer that than missing the boat.

And, it's quite fun and better than you might imagine as long as you put a bit of effort in.

quantumHazer 3 days ago | parent [-]

> From my perspective I like to be prepared

The same you that thinks has proved P = NP with ChatGPT?

brookst 3 days ago | parent | prev [-]

A very relatable experience. But not all that different from how humans work when in unfamiliar domains.

leptons 3 days ago | parent | next [-]

I'd rather work with a human. Even with our flaws, it's still better than constantly being lied to by a tin can. If a junior kept delivering broken results as much as the "AI" does, they wouldn't be on my team that long.

th0ma5 3 days ago | parent | prev [-]

Except... Completely different