|
| ▲ | indoordin0saur 3 days ago | parent | next [-] |
| Really depends on what you're working in. For me, I work with a lot of data frameworks that are maybe underrepresented in these models' training sets and it still tends to get things wrong. The other issue is business logic is complex to describe in a prompt, to the point where giving it all the context and business logic for it to succeed is almost as much work as doing it myself. As a data engineer I still only find models to be useful with small chunks of code or filling in tedious boilerplate to get things moving. |
| |
| ▲ | blonder 3 days ago | parent | next [-] | | Agreed. Common use cases like creating a simple LMS system Opus is shockingly good, saving hours upon hours from having to reinvent the wheel. Other things like simple queries to, and interactions with our ERP system it is still quite poor at, and increases development time rather than shortens it. | |
| ▲ | drzaiusx11 2 days ago | parent | prev [-] | | Just anecdotal but I work on some fairly left field service architectures; today it was a highly parallelized state machine processor operating on an in-house binary protocol. Opus 4.6 had no issue correctly identifying and mitigating a hairy out-of-order state corruption issue involving a non-trivial sequence of runtime conditions from thrown errors and failed recoveries. This was simply from having access to the code repository and a brief description of the observed behavior that I provided. Naturally I verified it wasn't bullshitting me, and sure enough it was correct. Impressive really, given none of the specifics could have been in its training set, but I guess we're finding that nothing really is "new", just a remix of what's come before in various recombinations. |
|
|
| ▲ | alistairSH 3 days ago | parent | prev [-] |
| How is success defined in those metrics? Is success "perfect - can deploy to prod immediately" or "saved some arbitrary amount of engineering time"? Anecdotal experience from my team of 15 engineers is we rarely get "perfect" but we do get enough to massive time savings across several common problem domains. |
| |
| ▲ | Esophagus4 2 days ago | parent [-] | | I think for me, it’s not so much an objective success metric as it is showing its progression over time. That’s what marvels me is how fast LLMs are progressing. And it still feels like early days (!). For methodology, I would check out the METR website though, they’ve published their results. |
|