What example do you need? In every single benchmark AI is getting better and better.

Before someone says "but benchmark doesn't reflect real world..." please name what metric you think is meaningful if not benchmark. Token consumption? OpenAI/Anthropic revenue?

▲

jacobsenscott 2 days ago | parent | next [-]

Whenever I try and use a "state of the art" LLM to generate code it takes longer to get a worse result than if I just wrote the code myself from the start. That's the experience of every good dev I know. So that's my benchmark. AI benchmarks are BS marketing gimmicks designed to give the appearance of progress - there are tremendous perverse financial incentives.

This will never change because you can only use an LLM to generate code (or any other type of output) you already know how to produce and are expert at - because you can never trust the output.

▲

whycombinetor 2 days ago | parent | next [-]

Third party benchmarks like terminalbench exist.

W.r.t code changes especially small ones (say 50 lines spread across 5 files), if you can't get an agent to make nearly exactly the code changes you want, just faster than you, that's a you problem at this point. If it maybe would take you 15 minutes, grok-code-fast-1 can do it in 2.

▲

trollbridge 2 days ago | parent | prev [-]

Right. With careful use of AIs, I can use it to gather information to help me make better designs (like giving me summaries of the current best available frameworks or libraries to choose for a given project), but as far as just generating an architecture and then generating the code and devops and so on for that? It's just not there, unless you're creating an app that effectively already exists, like some basic CRUD app.

If you're creating basic CRUDs, what on earth are you doing? That kind of thing should have been automated a long time ago.

▲

whycombinetor 2 days ago | parent [-]

What do you mean when you say building crud apps should be automated?

▲

trollbridge 2 days ago | parent | next [-]

CRUD apps are ridiculously simple and have been in existence my entire life. Yet it is surprisingly difficult to make a basic CRUD and host it somewhere. The bulk of useful but simple business apps are just a CRUD with a tiny bit of customisation and integration around them.

It is true that LLMs make it easier to build these kind of things without having to become a competent programmer first.

	▲	lomase 2 days ago \| parent [-]
		I don't know what kind of CRUD apps you work on. The kind of CRUD apps people pay me to work on are not simple.

▲

beeflet 2 days ago | parent | prev | next [-]

conventionally, it should have been abstracted by a higher-level language.

▲

machomaster 2 days ago | parent | prev [-]

E.g using Rails and generate scaffolding. Makes it real fast and easy to make a CRUD app.

▲

azemetre a day ago | parent | prev | next [-]

What metrics, that aren't controlled by industry, show AI getting better? Generally curious because those "ranking sites" to me seem to be infested with venture capital, so hardly fair or unbiased. The only reports I hear from academia are those being overly negative on AI.

▲

fzeroracer 2 days ago | parent | prev | next [-]

AI is getting better at every benchmark. Please ignore that we're not allowed to see these benchmarks and also ignore that the companies in question are creating the benchmarks that are being exceeded.

▲

philipwhiuk 2 days ago | parent | prev | next [-]

OpenAI net profit.

The figures for cost are wildly off to start with.

▲

bluefirebrand 2 days ago | parent | prev [-]

> please name what metric you think is meaningful

Job satisfaction and human flourishing

By those metrics, AI is getting worse and worse

▲

machomaster 2 days ago | parent [-]

AI is very satisfied in doing the job, just ask it.

AI is able to speed up the progress, to give more resources, to give the most important thing people have - time. The fact that these incredible gifts are misused (or used inefficiently) is not the problem of AI. This would be like complaining that the objective positive of increased food production is actually a negative, because people are getting fatter.

▲

bluefirebrand a day ago | parent | next [-]

> AI is very satisfied in doing the job, just ask it

I could not care less about AI's satisfaction in anything

▲

lomase 2 days ago | parent | prev [-]

Imagine anthropomorphing this hard.

▲

machomaster a day ago | parent [-]

You misunderstood. This is how the conversation went:

1. Is there steady progress in AI?

2. What example do you need? In every single benchmark AI is getting better and better.

3. Job satisfaction and human flourishing.

Hence my answer "AI is very satisfied in doing the job, just ask it". It came about because of the stupid comment 3, which tried to link and put a blame on unrelatable things (akin to refering to obesity when asked what metrics make him say that agriculture/transportation have not made progress in the last 100 years) and at the same time anthropomorphed AI. I only accepted the premise and continued answering on the same level in order to demonstrate stupidity of their answer.

▲

yeasku 18 hours ago | parent [-]

I did not misunderstood anything clanker.

	▲	machomaster 2 hours ago \| parent [-]
		I don't even know who you are. I was answering user "lomase".