Remix.run Logo
benreesman 19 hours ago

As someone who has historically been very much an LLM inevitabalism skeptic and has recently decided that we've crossed the breakeven point with indiscriminant use of Opus 4, eh, it's precisely because we're in late LLM === AGI hype world. They're actually cutting the shit on "this can do anything, and in a month, twice that!". This thing is crazy operator aligned, wildly SFT'd on curated codebases, and running a TTFT and cost that means it's basically Chinchilla maxed out, back to work boys, sell some NVIDIA stock.

This is precisely the opposite data point to the one you'd expect if the TESCREAL hype men were right: you do that when the writing is on the wall that this thing is uniquely suited to coding and the only way you'll ever do better than quantize and ad support it is to go after a deep pocketed vertical (our employers).

Nothing whatsoever to do with making a military drone or a car that can handle NYC or an Alexa that's useful instead of an SNL skit. That's other ML (very cool ML).

So the frontier lab folks have finally replaced the information commons they first destroyed, except you need a nuclear reactor and a bunch of Taiwan hawks that make Dick Cheney look like a weak-kneed feminist to run it at a loss forever.

The thing is, this kind of one ine itabalism isn't new: David Graeber spent a luminous career tearing strips off of hacks like Harari for the same exact moral and intellectual failure perpetrated by the same class warfare dynamics for the same lowbrow reasons.

polotics 18 hours ago | parent | next [-]

Can you translate "SFT'd" and "TTFT" and "TESCREAL" for the less clued-in members of the audience? On "one ine itabalism" I just gave up.

tricorn 5 hours ago | parent | next [-]

I just selected some of the text and my browser told me what they meant along with some background and some links for more information. The "one ine itabilism" actually found this conversation as a reference ...

sudhirb 16 hours ago | parent | prev | next [-]

I think:

SFT = Supervised Fine Tuning TTFT = Time To First Token TESCREAL = https://en.wikipedia.org/wiki/TESCREAL (bit of a long definition)

"on ine itabalism" = online tribalism?

aspenmayer 16 hours ago | parent | prev [-]

> one ine itabalism

online tribalism?

> SFT'd

supervised fine tuned?

> TTFT

test-time fine tune?

> TESCREAL

https://en.wikipedia.org/wiki/TESCREAL

ACCount36 14 hours ago | parent | prev [-]

This comment is absolute bullshit.

It starts off being wrong ("Opus 4 has maxed out LLM coding performance"), then keeps being wrong ("LLM inference is sold at a loss"), and tries to mask just how wrong it at any point in time is by pivoting from one flavor of bullshit to another on a dime, running laps a manic headless chicken.

benreesman 14 hours ago | parent [-]

Chinchilla maxed out refers to the so-called "Chinchilla Scaling Law" from the famous DeepMind paper about how in this particular regime, scale seemed to just flow like the spice. That happens sometimes, until it doesn't.

I didn't say the coding performance was maxed out, I said the ability to pour NVIDIA in and have performance come out the other side is at it's tail end. We will need architectural innovations to make the next big discontinuous leap (e.g. `1106-preview`).

They're doing things they don't normally do right: letting loose on the safety alignment bullshit and operator-aligning it, fine-tuning it on things like nixpkgs (cough defense cough), and generally not pretending it's an "everything machine" anymore.

This is state of the art Google/StackOverflow/FAANG-megagrep in 2025, and it's powerful (though the difference between this and peak Google/SO might be less than many readers realize: pre-SEO Google also spit out working code for most any query).

But it's not going to get twice as good next month or the month after that. They'd still be selling the dream on the universal magic anything machine if it were. And NVIDIA wouldn't be heavily discounted at every provider that rents it.