Remix.run Logo
cwsx 15 hours ago

I've been using `claude-4-sonnet` for the last few hours - haven't been able to test `opus` yet as it's still overloaded - but I have noticed a massive improvement so far.

I spent most of yesterday working on a tricky refactor (in a large codebase), rotating through `3.7/3.5/gemini/deepseek`, and barely making progress. I want to say I was running into context issues (even with very targeted prompts) but 3.7 loves a good rabbit-hole, so maybe it was that.

I also added a new "ticketing" system (via rules) to help it's task-specific memory, which I didn't really get to test it with 3.7 (before 4.0 came out), so unsure how much of an impact this has.

Using 4.0, the rest of this refactor (est. 4~ hrs w/ 3.7) took `sonnet-4.0` 45 minutes, including updating all of the documentation and tests (which normally with 3.7 requires multiple additional prompts, despite it being outlined in my rules files).

The biggest differences I've noticed:

  - much more accurate/consistent; it actually finishes tasks rather than telling me it's done (and nothing working)

  - less likely to get stuck in a rabbit hole

  - stopped getting stuck when unable to fix something (and trying the same 3 solutions over-and-over)

  - runs for MUCH longer without my intervention

  - when using 3.7:

     - had to prompt once every few minutes, 5 - 10mins MAX if the task was straight forward enough

     - had to cancel the output in 1/4 prompts as it'd get stuck in the same thought-loops

     - needed to restore from a previous checkpoint every few chats/conversations

  - with 4.0:

    - ive had 4 hours of basically one-shotting everything

    - prompts run for 10 mins MIN, and the output actually works

    - is remembering to run tests, fix errors, update docs etc

Obviously this is purely anecdotal - and, considering the temperament of LLMS, maybe I've just been lucky and will be back to cursing at it tomorrow, but imo this is the best feeling model since 3.5 released.