In six month deepseek won't be sota anymore und usage will be wayyyy down.

randomgermanguy 2 days ago | parent | next [-]

Only comparing on SOTA scores (ignoring price etc.) is like choosing your daily-driver by looking at who makes the fastest sports-car...

▲

LinXitoW 2 days ago | parent | next [-]

The constant improvements of SOTA are the main thing keeping the investment machine running. We can't really remove training costs from inference costs, because a bunch of the funding and loans for the inference hardware only exists because the promises the continuous training (tries to) provides.

▲

dnnddidiej 2 days ago | parent | prev [-]

Not really. SOTA vs non SOTA is "can I get my coding work actually done today" vs. "this can do customer support chat"

It is like car vs. kick scooter.

▲

regularfry 2 days ago | parent | next [-]

It really isn't. We get coding work actually done today on Opus 4.5. That's not SOTA any more, and anything proximate to that level, even quite loosely, is genuinely useful.

▲

dnnddidiej 2 days ago | parent [-]

OK we are in Opus 4.5 is not SOTA. Right by that definition .... yes you are right.

▲

randomgermanguy 2 days ago | parent [-]

I mean its almost halve a year, i think that counts ?

	▲	dnnddidiej 2 days ago \| parent [-]
		Time wise you are correct.

▲

randomgermanguy 2 days ago | parent | prev [-]

> "can I get my coding work actually done today" vs. "this can do customer support chat"

I think you need to define "can get coding work done" for this to make sense. Ive been using GPT-3 back-then for basic scripts, does that count ? Or only Claude-Code ?

I also think this is a false dichotomy, if you look at the Project Vend project or Vending-Bench, customer support etc. is at no means trivial. (Old but great story https://www.businessinsider.com/car-dealership-chevrolet-cha...)

▲

UlisesAC4 2 days ago | parent [-]

This, I have been doing my side hustle code with open code an 3.2 reasoner and it is way better than what I have at day job with copilot and whatever models are there.

	▲	wahnfrieden a day ago \| parent \| next [-]
		Copilot is a bad harness that perverts the productivity of models like GPT 5.5.
	▲	dnnddidiej 2 days ago \| parent \| prev [-]
		Tell me more please!

▲

2ndorderthought 2 days ago | parent | prev | next [-]

A huge proportion of those scores are gamed anyways. Use whatever works for you at the price and availability you can afford

▲

Palmik 2 days ago | parent | prev | next [-]

Or there will be DSv4.1/2/3 ;)

	▲	randomgermanguy 2 days ago \| parent [-]
		Definitely something in this realm, they call the models "preview" at a bunch of different points in the paper. What im really hoping is for a double-punch like with V3 -> R1

▲

man4 2 days ago | parent | prev | next [-]

[dead]

▲

Barbing 2 days ago | parent | prev [-]

Well, if they distilled once…