| ▲ | jacobsenscott 2 days ago | |||||||||||||||||||||||||||||||||||||
Whenever I try and use a "state of the art" LLM to generate code it takes longer to get a worse result than if I just wrote the code myself from the start. That's the experience of every good dev I know. So that's my benchmark. AI benchmarks are BS marketing gimmicks designed to give the appearance of progress - there are tremendous perverse financial incentives. This will never change because you can only use an LLM to generate code (or any other type of output) you already know how to produce and are expert at - because you can never trust the output. | ||||||||||||||||||||||||||||||||||||||
| ▲ | whycombinetor 2 days ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||
Third party benchmarks like terminalbench exist. W.r.t code changes especially small ones (say 50 lines spread across 5 files), if you can't get an agent to make nearly exactly the code changes you want, just faster than you, that's a you problem at this point. If it maybe would take you 15 minutes, grok-code-fast-1 can do it in 2. | ||||||||||||||||||||||||||||||||||||||
| ▲ | trollbridge 2 days ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||
Right. With careful use of AIs, I can use it to gather information to help me make better designs (like giving me summaries of the current best available frameworks or libraries to choose for a given project), but as far as just generating an architecture and then generating the code and devops and so on for that? It's just not there, unless you're creating an app that effectively already exists, like some basic CRUD app. If you're creating basic CRUDs, what on earth are you doing? That kind of thing should have been automated a long time ago. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||