Remix.run Logo
fhd2 4 hours ago

Their Readme.md is weirdly obsessed with "2 hours":

"before Claude Opus 4.5 started doing better than humans given only 2 hours"

"Claude Opus 4.5 in a casual Claude Code session, approximately matching the best human performance in 2 hours"

"Claude Opus 4.5 after 2 hours in our test-time compute harness"

"Claude Sonnet 4.5 after many more than 2 hours of test-time compute"

So that does make one wonder where this comes from. Could just be LLM generated with a talking point of "2 hours", models can fall in love with that kind of stuff. "after many more than 2 hours" is a bit of a tell.

Would be quite curious to know though. How I usually design take home assignments is:

1. Candidate has several _days_ to complete (usually around a week).

2. I design the task to only _take_ 2-4 hours, informing the candidate about that, but that doesn't mean they can't take longer. The subsequent interview usually reveals if they went overboard or struggled more than expected.

But I can easily picture some places sending a candidate the assignment and asking them to hand in their work within two hours. Similar to good old coding competitions.