Wait, but why?

If it's really better than what we had before, what does it matter how it was made? It's literally hacked together with the tools of the day (LLMs) isn't that the very hacker ethos? Patching stuff together that works in a new and useful way.

5x speed improvements on pdf text extraction might be great for some applications I'm not aware of, I wouldn't just dismiss it out of hand because the author used $robot to write the code.

Presumably the thought to make the thing in the first place and decide what features to add and not add was more important than how the code is generated?

▲

utopiah 3 hours ago | parent [-]

> If it's really better than what we had before

That's a very big if. The whole point is that what we had before was made slowly. This was made quickly. In itself it's not better but what it typically means is hours and hours of testing. Going through painful problems that highlight idiosyncrasies of the problem space. Things that are really weird and specific to whatever the tool is trying to address.

In such cases we can be expect that with very little time very few things were tested and tested properly (including a comment mentioned how tests were also generated). "We" the audience of potentially interested users have then to do that work (as plenty did commenting on that post).

IMHO what you bring forward is precisely that :

- can the new "solution" actually pass ALL the tests the previous one did? More?

This should be brought to the top and the actual compromises can then be understood, "we" can then decide if it's "better" for our context. In some cases faster with lossy output is actually better, in others absolutely not. The difference between the new and the old solutions isn't binary and have no visibility on that is what makes such a process nothing more than yet another showcase that LLMs can indeed produce "something" that is absolutely boring while consuming a TON of resources, including our own attention.

TL;DR: there should be test "harness" made by 3rd parties (or from well known software it is the closest too) that an LLM generated piece of code should pass before being actually compared.

	▲	utopiah 3 hours ago \| parent [-]
		related https://news.ycombinator.com/item?id=46437688