Remix.run Logo
yosefk 3 days ago

The examples are from the latest versions of ChatGPT, Claude, Grok, and Google AI Overview. I did not bother to list the full conversations because (A) LLMs are very verbose and (B) nothing ever reproduces, so in any case any failure is "abnormally bad." I guess dismissing failures and focusing on successes is a natural continuation of our industry's trend to ship software with bugs which allegedly don't matter because they're rare, except with "AI" the MTBF is orders of magnitude shorter