| ▲ | aeonfox 2 days ago | |||||||
Just to test out the OP articles theory, I was about to write some unit tests. I decided to let Opus 4.5 have a go. It did a pretty good job, but I spent probably as much time parsing what it had done as I would have writing the code from scratch. I still needed to clean it up, and of course, unsurprisingly, it had made a few tests that only really exercised the mocking it had made. A kind of mistake I wouldn't be caught dead sending in for peer review. I'm glad the OP feels fine just letting Opus do whatever it wants without a pause to look under the covers, and perhaps we all have to learn to stop worrying and love the LLM? But I think really, here and now, we're witness to just another hype article written by a professional blogger and speaker, who's highly motivated to write engagement bait like this. | ||||||||
| ▲ | benjiro 2 days ago | parent | next [-] | |||||||
That is the thing ... How long ago did we get Agent mode. Like in CoPilot that thing is only 7 months old. Things evolve faster then people realize... Agent mode, then came mcp servers, sub agents, now its rag databases allowing the LLMs to get data directly. The development of LLMS looks slow but with each iteration, things get improved. As yourself, what will have been the result of those same tests you ran, 21 months ago, with Claude 3.0? How about Claude 4.0, that is only 8 months ago. Right now Opus 4.5 is darn functional. The issue is more often not the code that it write, but more often it get stuck on "its too complex, let me simplify it", with the biggest issue often being context capacity. LLMs are still bad at deeper tasks, but compared to the last LLMs, the jumps have been enormous. What about a year from now? Two years? I have a hard time believing that Claude 3 was not even 2 years but just 21 month ago. And we considered that a massive jump up, useful for working on a single file... Now we are throwing it entire codebases and is darn good at debugging, editing etc. Do i like the results? No, there are lots of times that the results are not what "i wanted", but that is often a result of my own prompting being too generic. LLMs are never going to really replace experience programmers, but boy is the progress scary. | ||||||||
| ||||||||
| ▲ | theshrike79 2 days ago | parent | prev | next [-] | |||||||
This is only true if the code it wrote is something you can just sit down and write without any reference. Now do something like I did: An application that can get your IMDB/Letterboxd/Goodreads/Steam libraries and store them locally (own your data). Also use OMDB/TMDB to enrich the movie and TV show data. If you can write all that code faster than read what Claude did, I salute you and will subscribe to your Substack and Youtube channels :) Oh btw, neither Goodreads, IMDB nor Letterboxd have proper export APIs so you need to have a playwright-style browser automation do it. Just debugging that mess by writing all the code yourself is going to be hours and hours. The Steam API access Claude one-shotted (with Sonnet 3.7, this was a long time ago) as well as enriching the input data from different sources. | ||||||||
| ||||||||
| ▲ | 2 days ago | parent | prev [-] | |||||||
| [deleted] | ||||||||