| ▲ | forgotpwd16 4 hours ago | ||||||||||||||||
Unless misread, 2 hours isn't the time limit for the candidate to do this but the time Claude eventually needed to outperform best returned solution. Best candidate could've taken 6h~2d to achieve this result. | |||||||||||||||||
| ▲ | fhd2 4 hours ago | parent | next [-] | ||||||||||||||||
Their Readme.md is weirdly obsessed with "2 hours": "before Claude Opus 4.5 started doing better than humans given only 2 hours" "Claude Opus 4.5 in a casual Claude Code session, approximately matching the best human performance in 2 hours" "Claude Opus 4.5 after 2 hours in our test-time compute harness" "Claude Sonnet 4.5 after many more than 2 hours of test-time compute" So that does make one wonder where this comes from. Could just be LLM generated with a talking point of "2 hours", models can fall in love with that kind of stuff. "after many more than 2 hours" is a bit of a tell. Would be quite curious to know though. How I usually design take home assignments is: 1. Candidate has several _days_ to complete (usually around a week). 2. I design the task to only _take_ 2-4 hours, informing the candidate about that, but that doesn't mean they can't take longer. The subsequent interview usually reveals if they went overboard or struggled more than expected. But I can easily picture some places sending a candidate the assignment and asking them to hand in their work within two hours. Similar to good old coding competitions. | |||||||||||||||||
| ▲ | alcasa 3 hours ago | parent | prev [-] | ||||||||||||||||
No the 2 hours is their time limit for candidates. The thing is that you are allowed to use any non-human help for their take homes (open book), so if AI can solve it in below 2 hours, it's not very good at assessing the human. | |||||||||||||||||
| |||||||||||||||||