| ▲ | randomgermanguy 2 days ago |
| Only comparing on SOTA scores (ignoring price etc.) is like choosing your daily-driver by looking at who makes the fastest sports-car... |
|
| ▲ | LinXitoW 2 days ago | parent | next [-] |
| The constant improvements of SOTA are the main thing keeping the investment machine running. We can't really remove training costs from inference costs, because a bunch of the funding and loans for the inference hardware only exists because the promises the continuous training (tries to) provides. |
|
| ▲ | dnnddidiej 2 days ago | parent | prev [-] |
| Not really. SOTA vs non SOTA is "can I get my coding work actually done today" vs. "this can do customer support chat" It is like car vs. kick scooter. |
| |
| ▲ | regularfry 2 days ago | parent | next [-] | | It really isn't. We get coding work actually done today on Opus 4.5. That's not SOTA any more, and anything proximate to that level, even quite loosely, is genuinely useful. | | |
| ▲ | dnnddidiej 2 days ago | parent [-] | | OK we are in Opus 4.5 is not SOTA. Right by that definition .... yes you are right. | | |
| |
| ▲ | randomgermanguy 2 days ago | parent | prev [-] | | > "can I get my coding work actually done today" vs. "this can do customer support chat" I think you need to define "can get coding work done" for this to make sense. Ive been using GPT-3 back-then for basic scripts, does that count ? Or only Claude-Code ? I also think this is a false dichotomy, if you look at the Project Vend project or Vending-Bench, customer support etc. is at no means trivial. (Old but great story https://www.businessinsider.com/car-dealership-chevrolet-cha...) | | |
| ▲ | UlisesAC4 2 days ago | parent [-] | | This, I have been doing my side hustle code with open code an 3.2 reasoner and it is way better than what I have at day job with copilot and whatever models are there. | | |
|
|