Remix.run Logo
suddenlybananas 3 hours ago

I'm not denying any progress, I'm saying that reasoning failures that are simple which have gone viral are exactly the kind of thing that they will toss in the training data. Why wouldn't they? There's real reputational risks in not fixing it and no costs in fixing it.