| ▲ | DocTomoe a day ago | |
I keep wondering ... is this a good benchmark? What is a practical use-case for the skills Claude is supposed to present here? And if the author needs that particular website re-created with pixel-perfect accuracy, woulnd't it me simpler to just to it yourself? Sure, you can argue this is some sort of modern ACID-Test - but the ACID tests checked for real-world use-cases. This feels more like 'I have this one very specific request, the machine doesn't perfectly fullfill it, so the machine is at fault.'. Complaining from a high pedestal. I'm more surprised at how close Claude got in its reimagined SpaceJam-site. | ||