| ▲ | utopiah 3 hours ago | |
> If it's really better than what we had before That's a very big if. The whole point is that what we had before was made slowly. This was made quickly. In itself it's not better but what it typically means is hours and hours of testing. Going through painful problems that highlight idiosyncrasies of the problem space. Things that are really weird and specific to whatever the tool is trying to address. In such cases we can be expect that with very little time very few things were tested and tested properly (including a comment mentioned how tests were also generated). "We" the audience of potentially interested users have then to do that work (as plenty did commenting on that post). IMHO what you bring forward is precisely that : - can the new "solution" actually pass ALL the tests the previous one did? More? This should be brought to the top and the actual compromises can then be understood, "we" can then decide if it's "better" for our context. In some cases faster with lossy output is actually better, in others absolutely not. The difference between the new and the old solutions isn't binary and have no visibility on that is what makes such a process nothing more than yet another showcase that LLMs can indeed produce "something" that is absolutely boring while consuming a TON of resources, including our own attention. TL;DR: there should be test "harness" made by 3rd parties (or from well known software it is the closest too) that an LLM generated piece of code should pass before being actually compared. | ||
| ▲ | utopiah 3 hours ago | parent [-] | |