Remix.run Logo
CodingJeebus 3 hours ago

100% this. We've seen enough model releases at this point to know that there hasn't been a single model rollout making bold claims about its capability that wasn't met with criticism after release.

The fact that Anthropic provides such little detail about the specifics of its prompt in an otherwise detailed report is a major sleight of hand. Why not release the prompt? It's not publicly available, so what's the harm?

We can't criticize the methods of these replication pieces when Anthropic's methodology boils down to: "just trust us."

gruez 3 hours ago | parent [-]

>We've seen enough model releases at this point to know that there hasn't been a single model rollout making bold claims about its capability that wasn't met with criticism after release.

Examples? All I remember are vague claims about how the new model is dumber in some cases, or that they're gaming benchmarks.