Remix.run Logo
lominming 5 days ago

My main issue with many of these tests and reviews is that most of the results focus on testing the harness (in this case, likely Claude Code) rather than evaluating the model’s inherent performance.