| ▲ | Elevator Saga: The elevator programming game (2015)(play.elevatorsaga.com) | |||||||
| 54 points by xmprt 3 days ago | 8 comments | ||||||||
| ▲ | withinboredom 2 hours ago | parent | next [-] | |||||||
This is a really great test for vibe coding. This isn't easy, but it took me several hours to pass. Vibe coding the results is ... not exactly faster. Reminding it to output logs (I'm just doing this in chat and manually copy/pasting the code), it getting hung up on 'the maximum wait time' exactly equaling the challenge, etc. Opus was able to generate a passing implementation up to level 7 on the first level but can't seem to pass level 12. Sonnet, had to iterate on every level up to level 5, and couldn't pass that level. | ||||||||
| ▲ | alentred an hour ago | parent | prev | next [-] | |||||||
Solving it with Claude is a totally different kind of fun of course. But anyway, Claude browser extension is very good at it. I sent it the initial prompt, and then asked it to continue on each next challenge. It passed first 5 challenges on the fly, and started to struggle on challenge 6, which it solved after 4 attempts. I stopped at that point because the fun was depleted. It's like role-playing a story of software developer in the era AI, but accelerated. The results are truly good and fast. Coding fun zero. The new fun is prompt/context engineering. <elevator_saga_solver_prompt> You are a JavaScript developer. On this page you are presented with a coding challenge to solve: an elevator to program in JavaScript. Analyze the page, take a screenshot to understand the floor and elevator layout (how many floors, how many elevators), see the sample code in the solution text box and replace it with your solution for the challenge. Keep the solution simple, just sophisticated enough to solve the task at hand, do not over-engineer or optimize, not unless your initial solution fails. After you insert the solution into the text box, click the "Start" button to test it. After a time limit set for a solution (it is indicated on a page), verify if the solution worked: read page or take screenshot. If it didn't work, try a new better solution. If it worked, you task is complete. See the API documentation here: https://play.elevatorsaga.com/documentation.html#docs . </elevator_saga_solver_prompt> | ||||||||
| ▲ | darkstarsys 9 minutes ago | parent | prev | next [-] | |||||||
I've been fascinated by elevator algorithms since visiting NYC as a kid. The interesting stuff starts to happen when you account for popular floors, people going to work, coming home at the end of the day, dog-walking times, subway arrivals, all the semi-deterministic behavior we see in real life. | ||||||||
| ▲ | Meekro an hour ago | parent | prev | next [-] | |||||||
I thought it was fun to search for a solution that can beat every level (eventually found one!) As far as I know, no LLM can do this on its own, which tells us something about the kind of problems they’re weak at. | ||||||||
| ▲ | technothrasher 2 hours ago | parent | prev | next [-] | |||||||
AKA the hard drive scheduling game. Takes me back to my first algorithms class in school thirty five years ago. | ||||||||
| ▲ | eknkc 2 hours ago | parent | prev | next [-] | |||||||
This kind of stuff can be a great LLM benchmark as Opus basically screwed it up and created a monstrosity as solution on first try. | ||||||||
| ||||||||
| ▲ | agentultra 2 hours ago | parent | prev [-] | |||||||
Fun! Reminds me that one of my favourite exercises in TLA+ is to design an elevator call system. | ||||||||