Remix.run Logo
vntok 2 hours ago

Reproducing experimental results across models and vendors is trivial and cheap nowadays.

BoredPositron an hour ago | parent [-]

Not if anthropic goes further in obfuscating the output of claude code.

vntok 7 minutes ago | parent [-]

Why would you test implementation details? Test what's delivered, not how it's delivered. The thinking portion, synthetized or not, is merely implementation.

The resulting artefact, that's what is worth testing.