Whenever I try and use a "state of the art" LLM to generate code it takes longer to get a worse result than if I just wrote the code myself from the start. That's the experience of every good dev I know. So that's my benchmark. AI benchmarks are BS marketing gimmicks designed to give the appearance of progress - there are tremendous perverse financial incentives.

This will never change because you can only use an LLM to generate code (or any other type of output) you already know how to produce and are expert at - because you can never trust the output.

▲

whycombinetor 2 days ago | parent | next [-]

Third party benchmarks like terminalbench exist.

W.r.t code changes especially small ones (say 50 lines spread across 5 files), if you can't get an agent to make nearly exactly the code changes you want, just faster than you, that's a you problem at this point. If it maybe would take you 15 minutes, grok-code-fast-1 can do it in 2.

▲

trollbridge 2 days ago | parent | prev [-]

Right. With careful use of AIs, I can use it to gather information to help me make better designs (like giving me summaries of the current best available frameworks or libraries to choose for a given project), but as far as just generating an architecture and then generating the code and devops and so on for that? It's just not there, unless you're creating an app that effectively already exists, like some basic CRUD app.

If you're creating basic CRUDs, what on earth are you doing? That kind of thing should have been automated a long time ago.

▲

whycombinetor 2 days ago | parent [-]

What do you mean when you say building crud apps should be automated?

▲

trollbridge 2 days ago | parent | next [-]

CRUD apps are ridiculously simple and have been in existence my entire life. Yet it is surprisingly difficult to make a basic CRUD and host it somewhere. The bulk of useful but simple business apps are just a CRUD with a tiny bit of customisation and integration around them.

It is true that LLMs make it easier to build these kind of things without having to become a competent programmer first.

	▲	lomase 2 days ago \| parent [-]
		I don't know what kind of CRUD apps you work on. The kind of CRUD apps people pay me to work on are not simple.

▲

beeflet 2 days ago | parent | prev | next [-]

conventionally, it should have been abstracted by a higher-level language.

▲

machomaster 2 days ago | parent | prev [-]

E.g using Rails and generate scaffolding. Makes it real fast and easy to make a CRUD app.