measure outcomes (impact), not effort (token usage, lines of code, code coverage, hours worked, etc.)

lokar a day ago | parent | next [-]

The whole phenomenon of metric based Eng evaluations is because leadership does not trust line managers to evaluate individual engineers.

▲

4yfr a day ago | parent | prev | next [-]

What outcomes though? The ones I’ve seen posted are still nonsensical metrics that a publicly traded firm absolutely doesn’t care about.

It wants to see faster R&D, higher revenues from existing assets, greater operating margins, higher sales to invested capital ratio and so on…

The best way to measure that for a software firm is up-time of services, usage and project completion duration

▲

wpasc a day ago | parent [-]

measuring uptime? I've seen Anthropic's status page, and they are a >$1 Trillion dollar company who "largely solved" coding. so clearly you aren't correct. /s

▲

janalsncm a day ago | parent | next [-]

Ok, uptime. How do you measure an individual’s contribution to uptime? If Claude goes down does everyone take a hit? If Claude stays up everyone gets rewarded?

If so, your metric cannot distinguish between a bad engineer and a good one.

If not, you have the same problem you started with: measuring contributions to “uptime”.

▲

andsoitis a day ago | parent [-]

> If so, your metric cannot distinguish between a bad engineer and a good one.

A metric that moves in the same direction and amount for everyone based on external event isn’t a problem. The delta in performance of the great engineer will outweigh that of the poor, since the metric movement that is due to external circumstances will be the same in each kind of engineer and thus not count.

	▲	janalsncm 4 hours ago \| parent [-]
		That’s a new one. I have never heard of a company which operates like this, giving everyone equal reward no matter how much they contribute individually.

▲

lokar a day ago | parent | prev | next [-]

Unfortunately that is a group metric, we need individual metrics

▲

4yfr a day ago | parent | prev [-]

[flagged]

	▲	wpasc a day ago \| parent [-]
		my friend, I was being sarcastic before, and I am agreeing with you. LoC, token spend, etc as metrics are horrible measures. Software uptime is a great metric. I'm merely lamenting that in the age we're in, uptimes are getting worse and worse

▲

unknownfuture 21 hours ago | parent | prev | next [-]

Okay.

How?

This is an org pushing thousands of PRs a day. How do you solve the attribution problem for any one engineer's work given some set of impact metrics?

And keep in mind, most common impact metrics are trailing indicators, often over relative long time horizons.

▲

jdlshore 17 hours ago | parent [-]

As VPEng, I didn’t use metrics to assess individuals. Too prone to metric gaming.

Instead, I had a career ladder with a detailed rubric describing the skills an engineer at each level was expected to have. (Including communication and peer-leadership skills.)

Managers performed qualitative assessment of employees, using the career ladder as a guide. They relied on tech leads and Staff engineers to help them understand people’s skills, and provided 1:1 feedback and coaching.

We did use impact-based metrics to assess the results of important initiatives. We solved the attribution and lagging indicator problems by estimating impact rather than measuring it, and using a series of proxy measurements (activation, usage, retention, etc.) as a feedback mechanism for revising those estimates.

▲

unknownfuture 12 hours ago | parent [-]

Right, so you're sane. :D

Unfortunately I think we're entering (have entered?) a period of insanity.

The trouble is AI is being sold as an individual engineering accelerant. I suspect at the most AI pilled orgs you'll then see a commensurate push that starts off with measuring usage (tokens), then measuring output (PRs, code reviews), and then a lot of talk about impact while everyone quietly admits that remains as impossible now as it was fifty years ago.

Why? Because leadership is looking to (and selling, both internally and to the market) AI as the solution to all of their problems, which means they have to prove outcomes that justify their sky high AI budgets.

Higher level metrics at the org/division/product/project level aren't satisfying and flashy enough as they're slow moving and attenuated.

And squishy individual or team level assessments that rely on strong management won't show well on a cost-benefit comparison chart to the board.

At bottom I suspect AI pilled leadership wants to turn software into an assembly line and measure accordingly. Your post perfectly captures why it's still not that easy, and that the real problems in software remains the same and are unsolved by AI: building the right thing, at the right time, and then later figuring out what went well, what didn't, and trying to make those successes more repeatable and failures less likely.

	▲	jdlshore 10 hours ago \| parent [-]
		> Unfortunately I think we're entering (have entered?) a period of insanity. It’s been true for a long time. One of the hardest things as a senior leader in software is dealing with people demanding “accountability” (by which they mean making long-term plans with impossibly precise forecasts) and focusing on costs, all while ignoring value and refusing to engage in prioritization. (I swear, if I hear “it’s all important” again…) People are just… shallow. They operate on feelings and vibes. They follow the herd without thinking critically. Then they get angry when their dreams clash with reality, and they blame the messenger when those dreams turn out to be fantasies. But you’re right. AI is bringing out the worst in these tendencies. I think it’s because it’s so convincing when you don’t dig deeply, or aren’t an expert in the subject being discussed. On the plus side, it raises the floor, but I think we’re in for some difficult times before the lessons are learned. I don’t think it will take long, though: I suspect that naive use of AI is going to massively speed up the technical debt curve, and where it used to take 5-9 years to destroy a codebase, it will now take closer to 1-2.

▲

dheera a day ago | parent | prev | next [-]

> measure outcomes (impact)

This is also not easy. In particular proactively preventing bugs is not rewarded

▲

andsoitis 21 hours ago | parent [-]

> In particular proactively preventing bugs is not rewarded

The main way I think you can proactively prevent bugs in a meaningful way is by crafting and propagating better architecture.

Better (or worse) architecture and adoption of it can be measured through a mix of quantitative and qualitative means so those metrics could be used to evaluate the impact of the engineer driving that architecture.

	▲	dheera 20 hours ago \| parent [-]
		That's not how managers evaluate engineers at these corporations. The engineer who haphazardly launched on Friday then promptly saved the team at 3am and worked the weekends gets the promotion, while the one who prevented a bug from happening "didn't get anything done" and gets the PIP.

▲

veber-alex a day ago | parent | prev [-]

It's not flashy.

When shit just works for months or years no one is going to come and praise you for stuff you did a while back.

You are better off breaking stuff and then fixing them to show how useful you are.