Remix.run Logo
aspenmartin 5 hours ago

I appreciate the data here but I don't think the read is quite right;

Saying we have linear capability for super-linear cost compares an unbounded variable (dollars) to bounded instruments (because benchmarks saturate). On unbounded measures, growth is exponential; you can see METR time horizons double every ~4-7 months (https://metr.org/blog/2026-1-29-time-horizon-1-1/). And capability being proportional to log(compute) is what the scaling law predicts.

Epoch puts training cost growth at ~2.4x/year as your link shows. Meanwhile cost for fixed capability falls ~10-40x/year (https://epoch.ai/data-insights/llm-inference-price-trends), and lab revenue is growing ~10x/year! Anthropic went from $1B to $9B to $30B+ run rate in ~15 months, OpenAI ~$25B.

On [3]: the "destroying value" conclusion flips sign on an assumed 15% baseline rework rate. The report's most direct metric is +16% merged PRs per dev. The RCT evidence is genuinely mixed (METR: -19%, with n = 20 and Claude 3.x; Cui et al: +26%) but its just super hard to do this well, I think Faros stuff was pretty cool, I haven't seen this before so thank you for the reference.

oudlys 5 hours ago | parent | next [-]

>"On unbounded measures, growth is exponential"

Maybe. There was a great comment in the thread on Fable 5 yesterday about benchmark comparisons between Fable and the latest opus models. here it is: https://news.ycombinator.com/item?id=48464600.

You could be right, but this is the most direct benchmark comparison I could find and it's not that strong.

>the "destroying value" conclusion flips sign on an assumed 15% baseline rework rate. The report's most direct metric is +16% merged PRs per dev.

I discuss this directly in my analysis. There's also an 860% code churn increase ratio. You only need 9% of that to be allocated to wasteful rework to drive throughput flat to the 15% rework baseline. Not to an assumed ideal state where there was no rework.

But even if it were not true, a 16% throughput improvement is pretty weak given the investment - especially given the direct evidence of quality degradation. IMO.

I appreciate you reading my stuff and taking the data seriously. Thank you.

andrekandre 3 hours ago | parent [-]

  > But even if it were not true, a 16% throughput improvement is pretty weak given the investment - especially given the direct evidence of quality degradation. IMO.
n=1 but at $JOB we have throughput quotas now, and what is happening is that teams are just finding lots of busywork (renaming things, gardening of ai .md files, rewriting uis etc) and also dividing prs into smaller chunks to match the quotas... so even "throughout increase" doesn't say much if its not for improving the customer outcome (ime anyways)
balefulboy 5 hours ago | parent | prev [-]

METR's time horizon is not a reliable metric of LLM capability growth: https://www.transformernews.ai/p/against-the-metr-graph-codi...