|
| ▲ | randomgermanguy 2 days ago | parent | next [-] |
| Only comparing on SOTA scores (ignoring price etc.) is like choosing your daily-driver by looking at who makes the fastest sports-car... |
| |
| ▲ | LinXitoW 2 days ago | parent | next [-] | | The constant improvements of SOTA are the main thing keeping the investment machine running. We can't really remove training costs from inference costs, because a bunch of the funding and loans for the inference hardware only exists because the promises the continuous training (tries to) provides. | |
| ▲ | dnnddidiej 2 days ago | parent | prev [-] | | Not really. SOTA vs non SOTA is "can I get my coding work actually done today" vs. "this can do customer support chat" It is like car vs. kick scooter. | | |
| ▲ | regularfry 2 days ago | parent | next [-] | | It really isn't. We get coding work actually done today on Opus 4.5. That's not SOTA any more, and anything proximate to that level, even quite loosely, is genuinely useful. | | |
| ▲ | dnnddidiej 2 days ago | parent [-] | | OK we are in Opus 4.5 is not SOTA. Right by that definition .... yes you are right. | | |
| |
| ▲ | randomgermanguy 2 days ago | parent | prev [-] | | > "can I get my coding work actually done today" vs. "this can do customer support chat" I think you need to define "can get coding work done" for this to make sense. Ive been using GPT-3 back-then for basic scripts, does that count ? Or only Claude-Code ? I also think this is a false dichotomy, if you look at the Project Vend project or Vending-Bench, customer support etc. is at no means trivial. (Old but great story https://www.businessinsider.com/car-dealership-chevrolet-cha...) | | |
| ▲ | UlisesAC4 2 days ago | parent [-] | | This, I have been doing my side hustle code with open code an 3.2 reasoner and it is way better than what I have at day job with copilot and whatever models are there. | | |
|
|
|
|
| ▲ | 2ndorderthought 2 days ago | parent | prev | next [-] |
| A huge proportion of those scores are gamed anyways. Use whatever works for you at the price and availability you can afford |
|
| ▲ | Palmik 2 days ago | parent | prev | next [-] |
| Or there will be DSv4.1/2/3 ;) |
| |
| ▲ | randomgermanguy 2 days ago | parent [-] | | Definitely something in this realm, they call the models "preview" at a bunch of different points in the paper. What im really hoping is for a double-punch like with V3 -> R1 |
|
|
| ▲ | man4 2 days ago | parent | prev | next [-] |
| [dead] |
|
| ▲ | Barbing 2 days ago | parent | prev [-] |
| Well, if they distilled once… |