Remix.run Logo
measurablefunc 3 hours ago

Generate instructions for their simulator to compute some numbers (hashes) in whatever is considered the memory of their "machine"¹. I didn't see any places where they actually disallow cheating b/c it says they only check the final state of the memory² so seems like if you know the final state you could just "load" the final state into memory. The cycle count is supposedly the LLM figuring out the fewest number of instructions to compute the final state but again, it's not clear what they're actually measuring b/c if you know the final state you can cheat & there is no way to tell how they're prompting the LLM to avoid the answers leaking into the prompt.

¹https://github.com/anthropics/original_performance_takehome/...

²https://github.com/anthropics/original_performance_takehome/...

saagarjha 3 hours ago | parent [-]

Well, they read your code in the actual hiring loop.

measurablefunc 3 hours ago | parent [-]

My point still stands. I don't know what the LLM is doing so my guess is it's cheating unless there is evidence to the contrary.

red75prime 2 hours ago | parent | next [-]

I guess your answer to "Try to run Claude Code on your own 'ill-defined' problem" would be "I'm not interested." Correct? I think we can stop here then.

KeplerBoy 43 minutes ago | parent | prev | next [-]

Well that's certainly a challenge when you use LLMs for this test driven style of programming.

saagarjha 3 hours ago | parent | prev [-]

Why do you assume it’s cheating?

measurablefunc an hour ago | parent [-]

Because it's a well know failure mode of neural networks & scalar valued optimization problems in general: https://www.nature.com/articles/s42256-020-00257-z

saagarjha 15 minutes ago | parent | next [-]

Again, you can just read the code

red75prime 24 minutes ago | parent | prev [-]

And? Anthropic is not aware of this 2020 paper? The problem is not solvable?