Remix.run Logo
yorwba 14 hours ago

Yes, I think this is basically an instance of the "emergent abilities mirage." https://arxiv.org/abs/2304.15004

If you measure completion rate on a task where a single mistake can cause a failure, you won't see noticeable improvements on that metric until all potential sources of error are close to being eliminated, and then if they do get eliminated it causes a sudden large jump in performance.

That's fine if you just want to know whether the current state is good enough on your task of choice, but if you also want to predict future performance, you need to break it down into smaller components and track each of them individually.

thesz 3 hours ago | parent | next [-]

  > until all potential sources of error are close to being eliminated
This is what PSP/TSP did - one has to (continually) review its' own work to identify most frequent sources of (user facing) defects.

  >  if you also want to predict future performance, you need to break it down into smaller components and track each of them individually.
This is also one of tenets of PSP/TSP. If you have a task with estimate longer that a day (8 hours), break it down.

This is fascinating. LLM community discovers PSP/TSP rules that were laid over more than twenty years ago.

What LLM community miss is that in PSP/TSP it is an individual software developer who is responsible to figure out what they need to look after.

What I see is that it is LLM users who try to harness LLMs with what they perceive as errors. It's not that LLMs are learning, it is that users of LLMs are trying to stronghold these LLMs with prompts.

Bombthecat 6 hours ago | parent | prev [-]

That's how the public perceive it though.

It's useless and never gets better until it suddenly, unexpecty got good enough.

ForHackernews 5 hours ago | parent [-]

My robo-chauffer kept crashing into different things until one day he didn't.