Remix.run Logo
micromacrofoot 4 hours ago

The latest qwen actually performs a little better for some tasks, in my experience

latest claude still fails the car wash test

reddit_clone an hour ago | parent [-]

Not just _wrong_. It is confused! It is actually right in the second sentence. This was Friday, Opus 4.6.

>I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

Walk. It's 50 meters — you're going there to clean the car anyway, so drive it over if it needs washing, but if you're just dropping it off or it's a self-service place, walking is fine for that distance.

zozbot234 an hour ago | parent [-]

This is actually a good diagnostic of whether the model is skimping on the thinking loop. Try raising thinking effort and it should get it right. Of course, if you're running this in a coding harness with a whole lot of extraneous context, the model will be awfully confused as to what it should be thinking about.