| ▲ | catigula 10 hours ago | |||||||||||||||||||
I know this is a little controversial but the lack of performance on SWE-bench is hugely disappointing I think economically. These models don’t have any viable path to profitability if they can’t take engineering jobs. | ||||||||||||||||||||
| ▲ | martinald 10 hours ago | parent | next [-] | |||||||||||||||||||
I thought that but it does do a lot better on other benchmarks. Perhaps SWE bench just doesn't capture a lot of the improvement? If the web design improvements people have been posting on twitter, I suspect this will be a huge boon for developers. SWE benchmark is really testing bugfixing/feature dev more. Anyway let's see. I'm still hyped! | ||||||||||||||||||||
| ||||||||||||||||||||
| ▲ | Workaccount2 8 hours ago | parent | prev | next [-] | |||||||||||||||||||
People here, and in tech in general, are so lost in the sauce. According to at least OpenAI, who probably produces the most tokens (if we don't count google AI overviews and other unrequested AI bolt-ons) out of all the labs, programming tokens account for ~4% of total generations. That's nothing. The returns will come from everyone and their grandma paying $30-100/mo to use the services, just like everyone pays for a cell phone and electricity. Don't be fooled, we are still in the "Open hands" start-up business phase of LLMs. The "enshitification" will follow. | ||||||||||||||||||||
| ▲ | api 9 hours ago | parent | prev [-] | |||||||||||||||||||
Really? If they can make an engineer more productive, that's worth a lot. Naive napkin math: 1.5X productivity on one $200k/year engineer is worth $100k/year. | ||||||||||||||||||||
| ||||||||||||||||||||