Remix.run Logo
Bombthecat 2 days ago

In six month deepseek won't be sota anymore und usage will be wayyyy down.

randomgermanguy 2 days ago | parent | next [-]

Only comparing on SOTA scores (ignoring price etc.) is like choosing your daily-driver by looking at who makes the fastest sports-car...

LinXitoW 2 days ago | parent | next [-]

The constant improvements of SOTA are the main thing keeping the investment machine running. We can't really remove training costs from inference costs, because a bunch of the funding and loans for the inference hardware only exists because the promises the continuous training (tries to) provides.

dnnddidiej 2 days ago | parent | prev [-]

Not really. SOTA vs non SOTA is "can I get my coding work actually done today" vs. "this can do customer support chat"

It is like car vs. kick scooter.

regularfry 2 days ago | parent | next [-]

It really isn't. We get coding work actually done today on Opus 4.5. That's not SOTA any more, and anything proximate to that level, even quite loosely, is genuinely useful.

dnnddidiej 2 days ago | parent [-]

OK we are in Opus 4.5 is not SOTA. Right by that definition .... yes you are right.

randomgermanguy 2 days ago | parent [-]

I mean its almost halve a year, i think that counts ?

dnnddidiej 2 days ago | parent [-]

Time wise you are correct.

randomgermanguy 2 days ago | parent | prev [-]

> "can I get my coding work actually done today" vs. "this can do customer support chat"

I think you need to define "can get coding work done" for this to make sense. Ive been using GPT-3 back-then for basic scripts, does that count ? Or only Claude-Code ?

I also think this is a false dichotomy, if you look at the Project Vend project or Vending-Bench, customer support etc. is at no means trivial. (Old but great story https://www.businessinsider.com/car-dealership-chevrolet-cha...)

UlisesAC4 2 days ago | parent [-]

This, I have been doing my side hustle code with open code an 3.2 reasoner and it is way better than what I have at day job with copilot and whatever models are there.

wahnfrieden a day ago | parent | next [-]

Copilot is a bad harness that perverts the productivity of models like GPT 5.5.

dnnddidiej 2 days ago | parent | prev [-]

Tell me more please!

2ndorderthought 2 days ago | parent | prev | next [-]

A huge proportion of those scores are gamed anyways. Use whatever works for you at the price and availability you can afford

Palmik 2 days ago | parent | prev | next [-]

Or there will be DSv4.1/2/3 ;)

randomgermanguy 2 days ago | parent [-]

Definitely something in this realm, they call the models "preview" at a bunch of different points in the paper.

What im really hoping is for a double-punch like with V3 -> R1

man4 2 days ago | parent | prev | next [-]

[dead]

Barbing 2 days ago | parent | prev [-]

Well, if they distilled once…