Ask HN: How do we measure software in LLM era?

A bit of a rant. Sorry!

With the probablistic pluggable 'brain' existing in parts of the solution how are you measuring anything is better or worse?

I am at a loss to quantify whether anything is improving or worsening anything. It probably is also because of the various metrics that keeps popping up

* Accuracy

* Cost of running

* Context

* Size

* Time

* Turns

all these vary in a large band even with the same 'brain' on the same 'provider'. It is not so different than a database running strained under load - drawing from a simpler times. But here, which elastiuc band is getting pulled in which direction is worse than playing 3D Tetris.

Then there is the harness side variability of tool choices. Which seems to be the only knob the developer these days seems to have some control over. Other than the deterministic parts of the system.

How are we even going to triage a ticket with so many variabilities. In a runtime. That apparently is still called a software.

Do we just tell the users that you are on your own and whatever you need to solve is between you and your brain of choice?

What are you doing?

▲

Abushan 8 hours ago | parent | next [-]

I suspect an era where software output matters less than leverage. If one engineer with AI can replace weeks of work, metrics like LOC and story points become almost meaningless

	▲	bonigv 4 hours ago \| parent [-]
		Agree that LOC and Story points are not metrics that needs tracking. What is happening is that same model and provider can use a very different experience in terms of [Turns, Tokens, Time and to certain extent the results]. This is expected and to some extent acceptable. In this scenario how will we write a test case? What is success? What moves the needle to say that a commit made the software perform 20% better?

▲

austin-cheney 8 hours ago | parent | prev | next [-]

Software should be measured exactly the same as before:

* execution speed

* execution resource cost

* average time to debug issues

* average to add/remove features

	▲	bonigv 4 hours ago \| parent [-]
		Agree. The challenge is that same model/provider can use dramatically different numbers for thefirst two items. Maintainability is an invariant of quality of the system. Measuring that is a good metric for sure.

▲

madikz 5 hours ago | parent | prev [-]

[flagged]