Remix.run Logo
htrp a day ago

Allegedly Claude 4 Opus can run autonomously for 7 hours (basically automating an entire SWE workday).

jeremyjh a day ago | parent | next [-]

Which sort of workday? The sort where you rewrite your code 8 times and end the day with no marginal business value produced?

renewiltord a day ago | parent | next [-]

Well Claude 3.7 definitely did the one where it was supposed to process a file and it finally settled on `fs.copyFile(src, dst)` which I think is pro-level interaction. I want those $0.95 back.

But I love you Claude. It was me, not you.

readthenotes1 a day ago | parent | prev [-]

Well, at least it doesn't distract your coworkers, disrupting their flow

kaoD a day ago | parent [-]

I'm already working on the Slack MCP integration.

jeremyjh a day ago | parent [-]

Please encourage it to use lots of emojis.

htrp a day ago | parent | prev | next [-]

>Rakuten validated its capabilities with a demanding open-source refactor running independently for 7 hours with sustained performance.

From their customer testimonials in the announcement, more below

>Cursor calls it state-of-the-art for coding and a leap forward in complex codebase understanding. Replit reports improved precision and dramatic advancements for complex changes across multiple files. Block calls it the first model to boost code quality during editing and debugging in its agent, codename goose, while maintaining full performance and reliability. Rakuten validated its capabilities with a demanding open-source refactor running independently for 7 hours with sustained performance. Cognition notes Opus 4 excels at solving complex challenges that other models can't, successfully handling critical actions that previous models have missed.

>GitHub says Claude Sonnet 4 soars in agentic scenarios and will introduce it as the base model for the new coding agent in GitHub Copilot. Manus highlights its improvements in following complex instructions, clear reasoning, and aesthetic outputs. iGent reports Sonnet 4 excels at autonomous multi-feature app development, as well as substantially improved problem-solving and codebase navigation—reducing navigation errors from 20% to near zero. Sourcegraph says the model shows promise as a substantial leap in software development—staying on track longer, understanding problems more deeply, and providing more elegant code quality. Augment Code reports higher success rates, more surgical code edits, and more careful work through complex tasks, making it the top choice for their primary model.

paxys a day ago | parent | prev | next [-]

I can write an algorithm to run in a loop forever, but that doesn't make it equivalent to infinite engineers. It's the output that matters.

catigula a day ago | parent | prev | next [-]

That is quite the allegation.

speedgoose a day ago | parent | prev | next [-]

Easy, I can also make a nanoGPT run for 7 hours when inferring on a 68k, and make it produce as much value as I usually do.

victorbjorklund a day ago | parent | prev | next [-]

Makes no sense to measure it in hours. You can have a slow CPU making the model run for longer.

krelian a day ago | parent | prev [-]

How much does it cost to have it run for 7 hours straight?