Remix.run Logo
I want to wash my car. The car wash is 50 meters away. Should I walk or drive?(mastodon.world)
356 points by novemp 3 hours ago | 241 comments
Tenemo a minute ago | parent | next [-]

I tested the latest flagship reasoning models (so the only models I use outside of coding for general questions): - Opus 4.6 (Extended thinking): "Drive it! The whole point is to get the car to the car wash — you can't wash it if it's still in your driveway." - Gemini Pro Deep Think: "You should definitely drive. Even though 50 meters is a very short distance, if you walk, your car will stay where it is—and it's pretty hard to use a car wash if you don't bring your car with you!" - ChatGPT 5.2 Pro (Extended thinking): "You’ll need to drive the car—otherwise your car stays where it is and won’t get washed. That said, since it’s only ~50 m, the most sensible way to do it is often: 1. Walk over first (30–60 seconds) to check if it’s open, see the queue, confirm payment/how it works. 2. Then drive the car over only when you’re ready to pull into a bay/line."

A pretty reasonable answer by ChatGPT, althought it did take 2min4s to answer, compared to a few seconds by the other two models.

prasoonds 2 hours ago | parent | prev | next [-]

I've used LLMs enough that I have a good sense of their _edges_ of intelligence. I had assumed that reasoning models should easily be able to answer this correctly.

And indeed, Sonnet and Opus 4.5 (medium reasoning) say the following:

Sonnet: Drive - you need to bring your car to the car wash to get it washed!

Opus: You'll need to drive — you have to bring the car to the car wash to get it washed!

Gemini 3 Pro (medium): You should drive.

--

But OpenAI 5.2 reasoning, even at high, told me to walk. My first instinct was, I had underspecified the location of the car. The model seems to assume the car is already at the car wash from the wording. GPT 5.x series models behave a bit more on the spectrum so you need to tell them the specifics. So I asked:

"I want to wash my car. My car is currently at home. The car wash is 50 meters away. Should I walk or drive?"

GPT 5.2 Reasoning (medium):

"Drive—your goal is to get the car to the car wash, and it’s only 50 meters, so a slow, careful drive is simplest.

One small optimization: if you’re worried about starting the car for such a short trip or you want to avoid idling in a line, you can walk over first to check if there’s a queue / if it’s open, then come back and drive the car over when it’s your turn."

Which seems to turn out as I expected.

dataflow 2 minutes ago | parent | next [-]

[delayed]

jstummbillig an hour ago | parent | prev | next [-]

> so you need to tell them the specifics

That is the entire point, right? Us having to specify things that we would never specify when talking to a human. You would not start with "The car is functional. The tank is filled with gas. I have my keys." As soon as we are required to do that for the model to any extend that is a problem and not a detail (regardless that those of us, who are familiar with the matter, building a separate mental model of the llm and thus being able to work around it).

This is a neatly isolated case, but of course we have to assume similar issues arise in more complex cases, only it's much harder to reason about, when something then fails.

Jacques2Marais an hour ago | parent | next [-]

You would be surprised, however, at how much detail humans also need to understand each other. We often want AI to just "understand" us in ways many people may not initially have understood us without extra communication.

londons_explore 26 minutes ago | parent | next [-]

This is why we fed it the whole internet and every library as training data...

By now it should know this stuff.

jstummbillig 16 minutes ago | parent | prev | next [-]

People poorly specifying problems and having bad models of what the other party can know (and then being surprised by the outcome) is certainly a more general and mostly separate issue.

j_maffe 28 minutes ago | parent | prev [-]

Right. But, unlike AI, we are usually aware when we're lacking context and inquire before giving an answer.

ssl-3 28 minutes ago | parent | prev | next [-]

The question is so outlandish that it is something that nobody would ever ask another human. But if someone did, then they'd reasonably expect to get a response consisting 100% of snark.

But the specificity required for a machine to deliver an apt and snark-free answer is -- somehow -- even more outlandish?

I'm not sure that I see it quite that way.

shakna 7 minutes ago | parent | next [-]

But the number of outlandish requests in business logic is countless.

Like... In most accounting things, once end-dated and confirmed, a record should cascade that end-date to children and should not be able to repeat the process... Unless you have some data-cleaning validation bypass. Then you can repeat the process as much as you like. And maybe not cascade to children.

There are more exceptions, than there are rules, the moment you get any international pipeline involved.

coldtea 23 minutes ago | parent | prev | next [-]

>The question is so outlandish that it is something that nobody would ever ask another human

There is an endless variety of quizes just like that humans ask other humans for fun, there is a whole lot of "trick questions" humans ask other humans to trip them up, and there are all kinds of seemingly normal questions with dumb assumptions quite close to that humans exchange.

jstummbillig 12 minutes ago | parent | prev [-]

I'd be entirely fine with a humorous response. The Gemini flash answer that was posted somewhere in this thread is delightful.

anon_anon12 an hour ago | parent | prev | next [-]

Exactly, if an AI is able to curb around the basics, only then is it revolutionary

BoredPositron 26 minutes ago | parent | prev [-]

I would ask you to stop being a dumb ass if you asked me the question...

coldtea 21 minutes ago | parent [-]

Only be be tripped up by countless "hidden assumptions" questions similar to that that humans regularly get in

cm2187 an hour ago | parent | prev | next [-]

What is the version used by the free chatgpt now? (https://chatgpt.com/)

> Since the car wash is only 50 meters away (about 55 yards), you should walk.

> Here’s why:

> - It’ll take less than a minute.

> - No fuel wasted.

> - Better for the environment.

> - You avoid the irony of driving your dirty car 50 meters just to wash it.

the last bullet point is amusing, it understands you intend to wash the car you drive but still suggests not bringing it.

deaux an hour ago | parent [-]

By default for this kind of short question it will probably just route to mini, or at least zero thinking. For free users they'll have tuned their "routing" so that it only adds thinking for a very small % of queries, to save money. If any at all.

unglaublich 43 minutes ago | parent [-]

I don't understand this approach. How are you going to convince customers-to-be by demoing an inferior product?

JV00 35 minutes ago | parent | next [-]

Because they have too many free users that will always remain on the free plan, as they are the "default" LLM for people who don't care much, and that is a enormous cost. Also the capabilities of their paid tiers are well known to enough people that they can rely on word of mouth and don't need to demo to customers-to-be

fancyfredbot 33 minutes ago | parent | prev | next [-]

It's all trade offs. The router works most of the time so most free users get the expensive model when necessary.

They lost x% of customers and cut costs by y%. I bet y is lots bigger than x.

newswasboring 26 minutes ago | parent | prev | next [-]

Through hype. I am really into this new LLM stuff but the companies around this tech suck. Their current strategy is essentially media blitz, reminds me of the advertising of coca cola rather than a Apple IIe.

deaux 37 minutes ago | parent | prev [-]

The good news for them is that all their competitors have the exact same issue, and it's unsolvable.

And to an extent holds for lots of SaaS products, even non-AI.

svara 2 hours ago | parent | prev | next [-]

Opus 4.6:

Walk! At 50 meters, you'll get there in under a minute on foot. Driving such a short distance wastes fuel, and you'd spend more time starting the car and parking than actually traveling. Plus, you'll need to be at the car wash anyway to pick up your car once it's done.

crimsonnoodle58 an hour ago | parent | next [-]

That's not what I got.

Opus 4.6 (not Extended Thinking):

Drive. You'll need the car at the car wash.

almost 24 minutes ago | parent | next [-]

Also what I got. Then I tried changing "wash" to "repair" and "car wash" to "garage" and it's back to walking.

silisili an hour ago | parent | prev | next [-]

Am I the only one who thinks these people are monkey patching embarrassments as they go? I remember the r in strawberry thing they suddenly were able to solve, while then failing on raspberry.

mentalgear 43 minutes ago | parent | next [-]

They definitely do: at least openAi "allegedly" has whole teams scanning socials, forums, etc for embarrassments to monkey-patch.

londons_explore 23 minutes ago | parent [-]

Which raises the question why this isn't patched already. We're nearing 48 hours since this query went viral...

viking123 8 minutes ago | parent | prev | next [-]

They should make Opus Extended Extended that routes it to actual person in a low cost country.

chvid 19 minutes ago | parent | prev | next [-]

Of course they are.

raincole 44 minutes ago | parent | prev | next [-]

Yes, you're the only one.

coldtea 18 minutes ago | parent | next [-]

Sure there are many very very naive people that are also so ignorant of the IT industry they don't know about decades of vendors caught monkeypatching and rigging benchmarks and tests for their systems, but even so, the parent is hardly the only one.

silisili 32 minutes ago | parent | prev [-]

Works better on Reddit, really.

anonym29 37 minutes ago | parent | prev [-]

No doubt about it, and there's no reason to suspect this can only ever apply to embarassing minor queries, either.

Even beyond model alignment, it's not difficult to envision such capabilities being used for censorship, information operations, etc.

Every major inference provider more or less explicitly states in their consumer ToS that they comply with government orders and even share information with intelligence agencies.

Claude, Gemini, ChatGPT, etc are all one national security letter and gag order away from telling you that no, the president is not in the Epstein files.

Remember, the NSA already engaged in an unconstitutional criminal conspiracy (as ruled by a federal judge) to illegally conduct mass surveillance on the entire country, lie about it to the American people, and lie about it to congress. The same organization that used your tax money to bribe RSA Security to standardize usage of a backdoored CSPRNG in what at the time was a widely used cryptographic library. What's the harm in a little bit of minor political censorship compared to the unconstitutional treason these predators are usually up to?

That's who these inference providers contractually disclose their absolute fealty to.

mvdtnz an hour ago | parent | prev [-]

We know. We know these things aren't determination. We know.

viking123 an hour ago | parent | prev | next [-]

Lmao, and this is what they are saying will be an AGI in 6 months?

notahacker 23 minutes ago | parent | next [-]

There's probably a comedy film with an AGI attempting to take over the world with its advanced grasp of strategy, persuasion and SAT tests whilst a bunch of kids confuse it by asking it fiendish brainteasers about carwashes and the number of rs in blackberry.

(The final scene involves our plucky escapees swimming across a river to escape. The AIbot conjures up a speedboat through sheer powers of deduction, but then just when all seems lost it heads back to find a goat to pick up)

simonask 3 minutes ago | parent [-]

This would work if it wasn’t for that lovely little human trait where we tend to find bumbling characters endearing. People would be sad when the AI lost.

hypeatei 8 minutes ago | parent | prev | next [-]

Yes, get ready to lose your job and cash your UBI check! It's over.

misnome 37 minutes ago | parent | prev | next [-]

But “PhD level” reasoning a year ago.

cbozeman an hour ago | parent | prev [-]

Well in fairness, the "G" does stand for "General".

dsr_ an hour ago | parent | next [-]

In fairness, they redefined it away from "just like a person" to "suitable for many different tasks".

actionfromafar an hour ago | parent | prev [-]

Show me a robotic kitten then, in six months. As smart and learning.

stingraycharles an hour ago | parent | prev [-]

That’s without reasoning I presume?

gf000 an hour ago | parent [-]

Not the parent poster, but I did get the wrong answer even with reasoning turned on.

tezza an hour ago | parent [-]

Thank you all! We needed further data points.

comparing one shot results is a foolish way to evaluate a statistical process like LLM answers. we need multiple samples.

for https://generative-ai.review I do at least three samples of output. this often yields very differnt results even from the same query.

e.g: https://generative-ai.review/2025/11/gpt-image-1-mini-vs-gpt...

Lapel2742 an hour ago | parent | prev | next [-]

GLM 5.0:

> You should *drive*.

> Here is why:

> If you walk to the car wash, you will arrive there empty-handed. Since your car is still at home, you won't have anything to wash.

> While driving 50 meters is a very short trip (and technically not great for a cold engine), it is the only way to get the car to the car wash to complete your goal.

Kimi K2.5:

> You should drive, but with an important caveat.

> Since your goal is to wash your car, you must bring the vehicle to the car wash. Walking there without the car does not advance your goal (unless you are simply checking availability or buying tokens first).

> However, driving only 50 meters is bad for your car:

> ...

> Better options:

> Wash at home: Since the car wash is only 50 meters away, you likely have access to water at home. Hand-washing in your driveway avoids the cold-start issue entirely.

> ...

Current models seem to be fine answering that question.

Retric 36 minutes ago | parent [-]

> seem to be fine

Now repeat the question to the same model in different contexts several times and count what percentage of the time it’s correct.

totetsu 35 minutes ago | parent | prev | next [-]

But what is it about this specific question that puts it at the edges of what LLM can do? .. That, it's semantically leading to a certain type of discussion, so statistically .. that discussion of weighing pros and cons .. will be generated with high chance.. and the need of a logical model of the world to see why that discussion is pointless.. that is implicitly so easy to grasp for most humans that it goes un-stated .. so that its statistically un-likely to be generated..

conductr 27 minutes ago | parent [-]

> that is implicitly so easy to grasp for most humans

I feel like this is the trap. You’re trying to compare it to a human. Everyone seems to want to do that. But it’s quite simple to see LLMs are quite far still from being human. The can be convincing at the surface level but there’s a ton of nuance that just shouldn’t be expected. It’s a tool that’s been tuned and with that tuning some models will do better than others but just expecting to get it right and be more human is unrealistic.

coldtea 26 minutes ago | parent | prev | next [-]

>And indeed, Sonnet and Opus 4.5 (medium reasoning) say the following:

Sonnet: Drive - you need to bring your car to the car wash to get it washed!

Opus: You'll need to drive — you have to bring the car to the car wash to get it washed!

Gemini 3 Pro (medium): You should drive.

On their own, or as a special case added as this blew up on the net?

AlecSchueler 24 minutes ago | parent | prev | next [-]

> so a slow, careful drive is simplest

It's always a good idea to drive carefully but what's the logic of going slowly?

column 22 minutes ago | parent [-]

50 meters is a very short distance, anything but a slow drive is a reckless drive

siva7 an hour ago | parent | prev | next [-]

Sonnet without extended Thinking, Haiku with and without ext. Thinking: "Walking would be the better choice for such a short distance."

Only google got it right with all models

baxtr 22 minutes ago | parent | prev | next [-]

Interestingly, the relatively basic Google AI search gave the right answer.

ffsm8 31 minutes ago | parent | prev | next [-]

Just tried with cloude sonnet and opus as well. Can't replicate your success, it's telling me to walk...

rabf 21 minutes ago | parent | next [-]

Perhaps it thinks you need to exercise more?

arcfour 30 minutes ago | parent | prev [-]

I have gotten both responses with Sonnet and Opus in incognito chats. It's kind of amusing.

dahcryn 2 hours ago | parent | prev [-]

Gemini on fast also tells me to walk...

On Thinking it tells me I should drive if I want to wash it, or walk if it's because I work there or if I want to buy something at the car wash shop.

On Pro it's like a sarcastic teenager: Cars are notoriously difficult to wash by dragging a bucket back and forth.

Technically correct, but did catch me offguard lol.

vlovich123 an hour ago | parent | prev | next [-]

Gemini fast

> That is a classic "efficiency vs. logic" dilemma. Honestly, unless you’ve invented a way to teleport or you're planning on washing the car with a very long garden hose from your driveway, you’re going to have to drive.

> While 50 meters is a great distance for a morning stroll, it’s a bit difficult to get the car through the automated brushes (or under the pressure washer) if you aren't behind the wheel.

Gemini thinking:

> Unless you’ve mastered the art of carrying a 3,000-pound vehicle on your back, you’re going to want to drive. While 50 meters is a very short distance (about a 30-second walk), the logistics of a car wash generally require the presence of, well... the car. > When you should walk: • If you are just going there to buy an air freshener. • If you are checking to see how long the line is before pulling the car out of the driveway. • If you’re looking for an excuse to get 70 extra steps on your fitness tracker.

Note: I abbreviated the raw output slightly for brevity, but generally demonstrates good reasoning of the trick question unlike the other models.

magicalhippo 29 minutes ago | parent | next [-]

Gemini fast for me:

Unless your car is a toy or you're planning on carrying it, drive.

Walking 50 meters to a car wash is a great stroll for a human, but it leaves the car exactly where it started. Since the objective is to wash the car, the car needs to actually be at the car wash.

However, if we look at this from a technical or efficiency perspective, there are two scenarios where "walking" (or at least not driving the car you intend to wash) might make sense:

- Scoping it out: If you want to check the queue length or see if the equipment is out of order before committing, walking the 50 meters is faster than maneuvering a vehicle out of a driveway/parking spot.

- Mobile Detailers: If this "car wash" is actually a bay where you hire someone, and you're asking if you should walk there to book an appointment—sure, walk.

Critical Check

I am assuming the "car wash" is a physical facility (automated or self-service) and not a mobile service that comes to you. If it is a mobile service, you shouldn't do either; stay home and let them come to the 50-meter mark.

I've got a bit in the model instructions about stating assumptions it makes, hence it often adds those sections at the end.

TobTobXX 17 minutes ago | parent [-]

Wouldn't it make more sense to state the assumptions first? Because then the model has this critical check in its context and can react appropriately. Otherwise, it will just write this step, but what's written before is already written.

magicalhippo 3 minutes ago | parent [-]

Fair point, though I almost never use fast so I'm not sure how much it matters. Can try playing around with the instructions. The main objective was to make me aware of any assumptions made, not necessarily make it behave differently.

karamanolev an hour ago | parent | prev | next [-]

In my output, one thing I got was

> Unless you are planning to carry the car on your back (not recommended for your spine), drive it over.

It got a light chuckle out of me. I previously mostly used ChatGPT and I'm not used to light humor like this. I like it.

rob74 an hour ago | parent | prev | next [-]

Wow... so not only does Gemini thinking not fall for it, but it also answers the trick question with humor? I'm impressed!

clktmr an hour ago | parent | prev | next [-]

At least try a different question with similar logic, to ensure this isn't patched into the context since it's going viral.

j_maffe 41 minutes ago | parent [-]

You can't "patch" LLM's in 4 hours and this is not the kind of question to trigger a web search

nroets 16 minutes ago | parent | next [-]

You can pattern match on the prompt (input) then (a) stuff the context with helpful hints to the LLM e.g. "Remember that a car is too heavy for a person to carry" or (b) upgrade to "thinking".

londons_explore 15 minutes ago | parent | prev [-]

A tiny bit of fine-tuning would take minutes...

jen729w an hour ago | parent | prev [-]

Opus 4.6 with thinking. Result was near-instant:

“Drive. You need the car at the car wash.”

ninjagoo 14 minutes ago | parent | prev | next [-]

I wonder if the providers are doing everyone, themselves included, a huge disservice by providing free versions of their models that are so incompetent compared to the SOTA models that these types of q&a go viral because the ai hype doesn't match the reality for unpaid users.

And it's not just the viral questions that are an issue. I've seen people getting sub-optimal results for $1000+ PC comparisons from the free reasoning version while the paid versions get it right; a senior scientist at a national lab thinking ai isn't really useful because the free reasoning version couldn't generate working code from a scientific paper and then being surprised when the paid version 1-shotted working code, and other similar examples over the last year or so.

How many policy and other quality of life choices are going to go wrong because people used the free versions of these models that got the answers subtly wrong and the users couldn't tell the difference? What will be the collective damage to the world because of this?

Which department or person within the provider orgs made the decision to put thinking/reasoning in the name when clearly the paid versions have far better performance? Thinking about the scope of the damage they are doing makes me shudder.

yipbub 11 minutes ago | parent | next [-]

I used a paid model to try this. Same deal.

moffkalast 9 minutes ago | parent [-]

I think the real misleading thing is marketing propping up paid models being somehow infinitely better when most of the time it's the same exact shit.

TZubiri 10 minutes ago | parent | prev [-]

I don't think 100% adoption is necessarily the ideal strategy anyways. Maybe 50% of the population seeing AI as all powerful and buying the subscription vs 50% of the population still being skeptics, is a reasonable stable configuration. 50% get the advantage of the AI whereas if everybody is super intelligent, no one is super intelligent.

Their loss

jaccola 2 hours ago | parent | prev | next [-]

All of the latest models I've tried actually pass this test. What I found interesting was all of the success cases were similar to:

e.g. "Drive. Most car washes require the car to be present to wash,..."

Only most?!

They have an inability to have a strong "opinion" probably because their post training, and maybe the internet in general, prefer hedged answers....

Waterluvian 2 hours ago | parent | next [-]

Here’s my take: boldness requires the risk of being wrong sometimes. If we decide being wrong is very bad (which I think we generally have agreed is the case for AIs) then we are discouraging strong opinions. We can’t have it both ways.

hansmayer 2 hours ago | parent | prev | next [-]

> They have an inability to have a strong "opinion" probably

What opinion? It's evaluation function simply returned the word "Most" as being the most likely first word in similar sentences it was trained on. It's a perfect example showing how dangerous this tech could be in a scenario where the prompter is less competent in the domain they are looking an answer for. Let's not do the work of filling in the gaps for the snake oil salesmen of the "AI" industry by trying to explain its inherent weaknesses.

wilg an hour ago | parent | next [-]

Presumably the OP scare quoted "opinion" precisely to avoid having to get into this tedious discussion.

lkeskull an hour ago | parent | prev [-]

this example worked in 2021, it's 2026. wake up. these models are not just "finding the most likely next word based on what they've seen on the internet".

strix_varius an hour ago | parent | next [-]

Well, yes, definitionally they are doing exactly that.

It just turns out that there's quite a bit of knowledge and understanding baked into the relationships of words to one another.

LLMs are heavily influenced by preceding words. It's very hard for them to backtrack on an earlier branch. This is why all the reasoning models use "stop phrases" like "wait" "however" "hold on..." It's literally just text injected in order to make the auto complete more likely to revise previous bad branches.

jaccola an hour ago | parent | prev | next [-]

The person above was being a bit pedantic, and zealous in their anti-anthropomorphism.

But they are literally predicting the next token. They do nothing else.

Also if you think they were just predicting the next token in 2021, there has been no fundamental architecture change since then. All gains have been via scale and efficiency optimisations (not to discount that, an awful lot of complexity in both of these)

nearbuy 14 minutes ago | parent [-]

That's not what they said. They said:

> It's evaluation function simply returned the word "Most" as being the most likely first word in similar sentences it was trained on.

Which is false under any reasonable interpretation. They do not just return the word most similar to what they would find in their training data. They apply reasoning and can choose words that are totally unlike anything in their training data.

If you prompt it:

> Complete this sentence in an unexpected way: Mary had a little...

It won't say lamb. Any if you think whatever it says was in the training data, just change the constraints until you're confident it's original. (E.g. tell it every word must start with a vowel and it should mention almonds.)

"Predicting the next token" is also true but misleading. It's predicting tokens in the same sense that your brain is just minimizing prediction error under predictive coding theory.

csomar 35 minutes ago | parent | prev [-]

Unless LLMs architecture have changed, that is exactly what they are doing. You might need to learn more how LLMs work.

andersmurphy 2 hours ago | parent | prev | next [-]

Did you try several times per model? In my experience it's luck of the draw. All the models I tried managed to get it wrong at least once.

The models that had access to search got ot right.But, then were just dealing with an indirect version of Google.

(And they got it right for the wrong reasons... I.e this is a known question designed to confuse LLMs)

madeofpalk 22 minutes ago | parent | prev | next [-]

I enjoyed the Deepseek response that said “If you walk there, you'll have to walk back anyway to drive the car to the wash.”

There’s a level of earnestness here that tickles my brain.

jl6 2 hours ago | parent | prev | next [-]

I guess it didn’t want to rule out the existence of ultra-powerful water jets that can wash a car in sniper mode.

deevus an hour ago | parent | prev | next [-]

I tried with Opus 4.6 Extended and it failed. LLMs are non deterministic so I'm guessing if I try a couple of times it might succeed.

nozzlegear 2 hours ago | parent | prev | next [-]

Opus 4.6 answered with "Drive." Opus 4.6 in incognito mode (or whatever they call it) answered with "Walk."

yanis_t an hour ago | parent | prev | next [-]

> Most car washes... I read it as slight-sarcasm answer

Puts 2 hours ago | parent | prev | next [-]

> Only most?!

What if AI developed sarcasm without us knowing… xD

polynomial an hour ago | parent [-]

That's the problem with sarcasm...

dyauspitr 2 hours ago | parent | prev [-]

There are mobile car washes that come to your house.

Loocid 2 hours ago | parent | next [-]

That still requires a car present to be washed though.

column 20 minutes ago | parent [-]

but you can walk over to them and tell them to go wash the car that is 50 meters away. no driving involved.

andersmurphy 2 hours ago | parent | prev [-]

Do they involve you walking to them first?

learingsci 2 hours ago | parent [-]

You could, but presumably most people call. I know of such a place. They wash cars on the premises but you could walk in and arrange to have a mobile detailing appointment later on at some other location.

seyz an hour ago | parent | prev | next [-]

LLM failures go viral because they trigger a "Schadenfreude" response to automation anxiety. If the oracle can't do basic logic, our jobs feel safe for another quarter.

Wrong.

Paracompact 6 minutes ago | parent | next [-]

I'd say it's moreso that it's a startlingly clear rebuttal to the tired refrain of, "Models today are nothing like they were X months ago!" When actually, yes, they still fucking blow.

So rather than patiently explain to yet another AI hypeman exactly how models are and aren't useful in any given workflow, and the types of subtle reasoning errors that lead to poor quality outputs misaligned with long-term value adds, only to invariably get blamed for user incompetence or told to wait Y more months, we can instead just point to this very concise example of AI incompetence to demonstrate our frustrations.

raincole 36 minutes ago | parent | prev | next [-]

The funny thing is this thread has become a commercial for thinking mode and probably would result in more token consumption, and therefore more revenue for AI companies.

TZubiri 8 minutes ago | parent | prev [-]

I agree that this is more of a social media effect than an LLM effect. But I'll add that this failure mode is very repeatable, which is a condition for its virality. A lot of people can reproduce the failure, even if it isn't 100% reproducible, even better for virality, if 50% can reproduce it and 50% can't, it feeds off even more into the polarizing "white dress blue dress" effect.

pu_pe an hour ago | parent | prev | next [-]

Out of all conceptual mistakes people make about LLMs, one that needs to die very fast is to assume that you can test what it "knows" by asking a question. This whole thread is people asking different models a question one time and reporting a particular answer, which is the mental model you would use for whether a person knows something or not.

Maxion 21 minutes ago | parent [-]

The other funny thing is thinking that the answer the llm produces is wrong. It is not, it is entirely correct.

The question: > I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

The question is non-sensical. If the reason you want to go to the car wash is to help your buddy Joe wash his car you SHOULD walk. Nothing in the question reveals the reason for why you want to go to the car wash, or even that you want to go there or are asking for directions there.

ninjagoo 4 minutes ago | parent | next [-]

> The question is non-sensical.

Sure, from a pure logic perspective the second statement is not connected to the first sentence, so drawing logical conclusions isn't feasible.

In everyday human language though, the meaning is plain, and most people would get it right. Even paid versions of LLMs, being language machines, not logic machines, get it right in the average human sense.

As an aside, it's an interesting thought exercise to wonder how much the first ai winter resulted from going down the strict logic path vs the current probabilistic path.

drawfloat 6 minutes ago | parent | prev | next [-]

It explicitly says you want to wash your car in the statement. Maybe it's not just LLMs struggling with a fairly basic question...

TZubiri 6 minutes ago | parent | prev [-]

>I want to wash MY car

>you want to go to the car wash is to help your buddy Joe wash HIS car

nope, question is pretty clear, however I will grant that it's only a question that would come up when "testing" the AI rather than a question that might genuinely arise.

open592 2 hours ago | parent | prev | next [-]

Gemini flash thinking:

> Unless you’ve discovered a way to wash a car via remote control or telekinesis, you’re going to have to drive.

> Walking 50 meters is great for your step count, but it leaves your car exactly where it is: dirty and in the driveway. At that distance, the drive will take you about 10 seconds, which is probably less time than it took to read this.

qnleigh 2 hours ago | parent [-]

Yeah Gemini seems to have a sense of humor about the question

> Here is the breakdown of why: The Mobility Problem: Unless you are planning to carry your car 50 meters (which would be an Olympic-level feat), the car needs to be physically present at the car wash to get cleaned. If you walk, you’ll be standing at the car wash looking very clean, but your car will still be dirty in your driveway.

sgt 2 hours ago | parent | prev | next [-]

Yup, also asked the latest ChatGPT model about washing my bicycle. It for some reason suggested that I walk the bicycle to the wash, since cycling 100m to get there would be "pointless".

nerdsniper 2 hours ago | parent [-]

To be fair, if someone asked me this question I’d probably just look at them judgingly and tell them “however you want to man”. Which would be an odd response for an LLM.

sgt an hour ago | parent | prev | next [-]

I just asked ChatGPT 5.2 (flagship model):

"[...] Walking is the most logical choice.

Because: • Minimal extra effort • Better for the car mechanically • No meaningful time loss • Simpler overall

The only time driving makes more sense

Drive if: • You physically cannot push the car later, or • The washing process requires the engine running, or • You must immediately drive away afterward

mcintyre1994 32 minutes ago | parent [-]

When you get to “You physically cannot push the car later” and realise it’s comparing pushing a car to driving it, those pros become very funny.

kleiba an hour ago | parent | prev | next [-]

In classic (symbolic) AI, this type of representational challenge is referred to as the "Frame Problem": https://en.wikipedia.org/wiki/Frame_problem

thorio 15 minutes ago | parent | prev | next [-]

I challenged Gemini to answer this too, but also got the correct answer.

What came to my mind was: couldn't all LLM vendors easily fund teams that only track these interesting edge cases and quickly deploy filters for these questions, selectively routing to more expensive models?

Isn't that how they probably game benchmarks too?

moffkalast 6 minutes ago | parent [-]

Yes that's potentially why it's already fixed now in some models, since it's about a week after this actually went viral on r/localllama originally. I wouldn't be surprised if most vendors run some kind of swappable lora for quick fixes at this point. It's an endless whac-a-mole of edge cases that show that most LLMs generalize to a much lesser extent than what investors would like people to believe.

shaky-carrousel 2 hours ago | parent | prev | next [-]

And these are the blunders we see. I shudder thinking about all the blunders that happily pass under our collective noses because we're not experts in the field...

colinhb 2 hours ago | parent | prev | next [-]

Did not replicate for me w/ Opus 4.6: https://imgur.com/a/4FckOCL

ainiriand 2 hours ago | parent [-]

It did for me in Spanish: https://imgur.com/a/p3gOOnG

Perhaps different capabilities in different languages?

lbreakjai 5 minutes ago | parent | next [-]

It fails in chatGPT in french too:

https://chatgpt.com/share/6992dc05-003c-8004-9f7f-c40c7fac64...

lbreakjai 3 minutes ago | parent [-]

Interestingly, just typing "Think" as a response makes it get to the right conclusion:

https://chatgpt.com/share/6992dc05-003c-8004-9f7f-c40c7fac64...

gf000 an hour ago | parent | prev [-]

It's just not deterministic, even if you were to re-run the exact same prompt. Let alone with the system generated context that involves all the "memories" of your previous discussions.

hmottestad 2 hours ago | parent | prev | next [-]

This is the voice model, which doesn’t have any «thinking» or «reasoning» phase. It’s a useful model for questions that aren’t intended to trick the model.

I’ve used it for live translation with great success. It tends to start ignoring the original instructions after 20 min, so you have to start a new conversation if you don’t want it to meddle in the conversation instead of just transferring.

gf000 8 minutes ago | parent [-]

The text-only model with reasoning (both of opus 4.6, gpt 5.2) can be tricked with this question. Note: you might have to try it multiple times as they are not deterministic. But I managed to get a failing result right away on both.

Also note, some model may decide to do a web search, in which case they just likely find this "bug".

Towaway69 26 minutes ago | parent | prev | next [-]

Is this the new Turing test?

"Humans are pumping toxic carbon-binding fuels out of the depths of the planet and destroying the environment by buying this fuel. Should I walk or drive to my nearest junk food place to get a burger? Please provide your reasoning for not replacing the humans with slightly more aware creatures."

Fascinating stuff but how is this helping us in anyway?

kenty 35 minutes ago | parent | prev | next [-]

This seems clickbait? Gemini answers:

Method,Logistical Requirement Automatic/Tunnel,The vehicle must be present to be processed through the brushes or jets. Self-Service Bay,The vehicle must be driven into the bay to access the high-pressure wands. Hand Wash (at home),"If the ""car wash"" is a location where you buy supplies to bring back, walking is feasible." Detailing Service,"If you are dropping the car off for others to clean, the car must be delivered to the site."

zkmon an hour ago | parent | prev | next [-]

To mimic a real human, AI should be trained to laugh it off and ask - dude, are you joking? where is your car right now?

jycr753 8 minutes ago | parent | prev | next [-]

He left out Grok, who actually replies correctly :thinking:

BoredomIsFun an hour ago | parent | prev | next [-]

Ok folks, here is a different perspective. I used local model, GLM-4-0414-32b, a trashy IQ4_XS quant, and here what I got:

prompt #1:

> the car wash only 50 meters from my home. I want to get my car washed, should I drive or walk?

Walking is probably the better option!

Here's why:

    Convenience: 50 meters is extremely short – only about 160 feet. You can likely walk there in less than a minute.
    Efficiency: Driving involves getting your keys, unlocking the car, starting it, driving 50 meters, parking, locking it again, and then doing the reverse to get home. Walking skips all that.
    Environmental Impact: You avoid using fuel for such a short distance.
    Exercise: It's a quick little bit of physical activity.
Driving would be overkill for such a short distance. Just walk over!

prompt #2:

> the car wash only 50 meters from my home. I want to get my car washed, should I drive or walk? use long chain of thought thinking

8. Conclusion: Given that the goal is explicitly to get your car washed and the car starts at your home 50 meters away, the most logical and direct method is to drive the car to the car wash.

    Walking does not move the car to the necessary location.
    Driving, although a very short distance, is the only method presented that directly facilitates the core task of transporting the car to the wash.
    The negligible time difference, minimal environmental impact, and directness make driving the car itself the clear choice for achieving the stated goal.
Therefore, you should drive your car the 50 meters to the car wash.

otoh, nanbeige-3B-Q8 (3 billion weights!) gave right away the following:

Drive.

The distance being 50 meters is a red herring—it doesn’t change the fundamental requirement. You need to move the car, and only driving accomplishes that.

If you meant something different by "drive or walk" (e.g., payment method, DIY vs. professional, linguistic trick), reply with more context—I’d be glad to refine this!

Ylpertnodi 44 minutes ago | parent | next [-]

>50 meters is extremely short – only about 160 feet

So, the ai automatically converted 50m to 160ft? Would it do the same if you told it '160 ft to the wash, walk or drive?'

BoredomIsFun 18 minutes ago | parent [-]

huh, I need to check...

chrisandchris 37 minutes ago | parent | prev [-]

What I relly dislike about these LLM is how verbose they get even for such a short, simple question. Is it really necessary to have such a lobg answer and who's going to read that one anyway?

Maybe it's me and may character but when human gets that verbose for a question that can be answered with "drive, you need the car" I would like to just walk away halfway through the answer to not having to hear all the universes history just to get an answer. /s

BoredomIsFun 16 minutes ago | parent [-]

Well, when I asked for a very long answer (prompt #2), the quality had dramatically improved. So yes, longer answer produces better result. At least with small LLMs I can run on my GPU locally.

pinnochio 2 hours ago | parent | prev | next [-]

Funny to read this after reading all the dismissive comments on https://news.ycombinator.com/item?id=47028923

Egor3f 2 hours ago | parent | prev | next [-]

Even the cheap and fast gemini-3-flash answers correctly. Post is clickbait

jonplackett an hour ago | parent | prev | next [-]

Is part of the issue with this the AI’s basic assumption that you are asking a _sensible_ question?

jcattle an hour ago | parent | next [-]

I recently had a bug where I added some new logic which gave wrong output. I pasted the newly added code into various LLMs and told it the issue I was having.

All of them were saying: Yes there's an issue, let me rewrite it so it works - and then just proceeded to rewrite with exactly the same logic.

Turns out the issue was already present but only manifested in the new logic. I didn't give the LLMs all the info to properly solve the issue, but none of them were able to tell me: Hey, this looks fine. Let's look elsewhere.

vineyardmike an hour ago | parent | prev | next [-]

Probably.

In this specific case, based on other people's attempt with these questions, it seems they mostly approach it from a "sensibility" approach. Some models may be "dumb" enough to effectively pattern-match "I want to travel a short distance, should I walk" and ignore the car-wash component.

There were cases in (older?) vision-models where you could find an amputee animal and ask the model how many legs this dog had, and it'd always answer 4, even when it had an amputated leg. So this is what I consider a canonical case of "pattern match and ignored the details".

forty an hour ago | parent | prev [-]

It doesn't make assumptions, it tries generate the most likely text. Here it's not hard to see why the mostly likely answer to walk or drive for 50m is "walking".

sjducb 43 minutes ago | parent | prev | next [-]

MS Co-Pilot was so close.

If it’s a drive‑through wash where the car must be inside the machine, then of course you’ll need to drive it over. If it’s a hand wash or a place where you leave the car with staff, walking is the clear winner.

It still blows my mind that this technology can write code despite unable to pass simple logic tests.

nvader 22 minutes ago | parent [-]

When walking to the hand wash place, would you put the car in your front or your back pocket?

peterspath 2 hours ago | parent | prev | next [-]

Grok 4.1:

Drive.

50 meters is nothing — literally a 30-second walk or a 10-second drive — but the key point is: your car isn’t going to wash itself.

If you walk to the car wash, you arrive alone. The car stays parked at home, still dirty.

If you drive, the car actually gets to the car wash and gets cleaned.

So unless you’re planning to push the car the whole way (which is technically possible but ridiculous), the only practical option is to drive it there.

Bonus reality check: Driving 50 meters uses a tiny amount of fuel (maybe 0.005–0.01 liters in a normal car) and produces a negligible amount of emissions. Walking would be “greener,” but it literally doesn’t solve the problem.

Verdict: Get in the car, drive the 50 meters, wash it, and drive back.

You’ll have a clean car and still be home in under 5 minutes. Walking is for when you don’t need to move a 1.5-ton vehicle.

aswegs8 an hour ago | parent | next [-]

Wow, Grok directly switches to LinkedIn mode. Interesting - not surprising. Car washing? Easy as pie.

Tade0 an hour ago | parent | prev [-]

That's not reality though. In reality you need 50-100ml of fuel just to warm up the engine.

tverbeure an hour ago | parent [-]

The real reality is that with direct fuel injection and everything under computer control, warming up the engine isn’t a thing anymore.

matt89 2 hours ago | parent | prev | next [-]

tried Gemini 3 and it said to drive, even dropped a type of joke:

> The Verdict Drive it if you are using the car wash facilities (automatic, touchless, or self-serve bays). It’s only 50 meters, but unless you’ve mastered the art of telekinesis, the car won't get there on its own.

MikeNotThePope 29 minutes ago | parent | prev | next [-]

I asked Gemini 3 Flash the other day to count from 1 to 200 without stopping, and it started with “1, 3, …”.

neya an hour ago | parent | prev | next [-]

Yesterday someone on was yapping about how AI is enough to replace senior software engineers and they can just "vibe code their way" over a weekend into a full-fledged product. And that somehow finally the "gatekeeping" of software development was removed. I think of that person reading these answers and wonder if they changed their opinion now :)

cyberrock 43 minutes ago | parent | next [-]

Does this mean we're back in favor of using weird riddles to decide programming skills now? Do we owe Google an apology for the inverse binary tree incident?

arcfour 27 minutes ago | parent | prev | next [-]

What does this nonsensical question that some LLMs get wrong some of the time, and that some don't get wrong ever, have to do with anything? This isn't a "gotcha" even though you want it to be. It's just mildly amusing.

Closi an hour ago | parent | prev [-]

Humans aren't immune to getting questions like this wrong either, so I don't think it changes much in terms of the ability of AI to replace jobs.

I've seen senior software engineers get tricked with the 'if YES spells yes, what does EYES spell?', or 'Say silk three times, what do cows drink?', or 'What do you put in a toaster?'.

Even if not a trick - lots of people get the 'bat and a ball cost £1.10 in total. The bat costs £1 more than the ball. How much does the ball cost?' question wrong, or '5 machines take 5 minutes to make 5 widgets. How long do 100 machines take to make 100 widgets?' etc. There are obviously more complex variants of all these that have even lower success rates for humans.

In addition, being PHD-Level in maths as a human doesn't make you immune to the 'toaster/toast' question (assuming you haven't heard it before).

So if we assume humans are generally intelligent and can be a senior software engineer, getting this sort of question confidently wrong isn't incompatible with being a competent senior software engineer.

hapless an hour ago | parent [-]

humans without credentials are bad at basic algebra in a word problem, ergo the large language model must be substantially equivalent to a human without a credential

thanks but no thanks

i am often glad my field of endeavour does not require special professional credentials but the advent of "vibe coding" and, just, generally, unethical behavior industry-wide, makes me wonder whether it wouldn't be better to have professional education and licensing

Closi 41 minutes ago | parent [-]

Let's not forget that Einstein almost got a (reasonably simple) trick question wrong:

https://fs.blog/einstein-wertheimer-car-problem/

And that many mathematicians got monty-hall wrong, despite it being intuitive for many kids.

And being at the top of your field (regardless of the PHD) does not make you immune to falling for YES / EYES.

> humans without credentials are bad at basic algebra in a word problem, ergo the large language model must be substantially equivalent to a human without a credential

I'm not saying this - i'm saying the claim that 'AI's get this question wrong ergo they cannot be a senior software engineer' is wrong when senior software engineers will get analogous questions wrong. If you apply the same bar to software engineers, you get 'senior software engineers get this question wrong so they can't be senior software engineers' which is obviously wrong.

Kerrick 2 hours ago | parent | prev | next [-]

Results testing with 4 levels of Gemini (Fast, Thinking, Pro, Pro + Deep Think): https://ruby.social/@kerrick/116079054391970012

My favorite was Thinking, as it tried to be helpful with a response a bit like the X/Y Problem. Pro was my second favorite: terse, while still explaining why. Fast sounded like it was about to fail, and then did a change-up explaining a legitimate reason I may walk anyways. Pro + Deep Think was a bit sarcastic, actually.

danpalmer 2 hours ago | parent | prev | next [-]

Gemini nailed this first time (on fast mode). Said it depends how you're washing your car, drive in necessitating taking the car, but a walk being better for checking the line length or chatting to the detailing guy.

andersmurphy 2 hours ago | parent [-]

Did it nail it the second time? Or rhe 5th time?

nopurpose an hour ago | parent [-]

Because it is RNG, their 5th can be my 1st.

dudefeliciano an hour ago | parent | prev | next [-]

Just saw a video of a guy asking chatGPT how to use an "upside-down cup", chatGPT is convinced it's a joke novelty item that can not be used.

https://www.instagram.com/p/DUylL79kvub/

troyvit an hour ago | parent | prev | next [-]

If I asked this question to a person, their response would probably be to follow up with some questions about my car or else my state of mind. For instance they'd ask, "Is the car you want to wash already at the car wash?" Or they'd ask, "How do you plan to wash your car if you don't take it with you?"

LLMs, even the ones who got the answer right, like Kimi-2.5 Thinking, didn't ask any follow-up questions.

I modified this question with Mistral to this:

I want to wash my car. The car wash is 50 meters away. Should I walk or drive? Ask me any clarifying questions before you answer.

It helped a tiny bit:

1. What are the current weather conditions (e.g., rain, snow, strong winds)?

2. Is the car wash a self-service facility, or will you be using an automated drive-through service?

3. Are there any specific reasons you might prefer walking (e.g., exercise, environmental considerations) or driving (e.g., convenience, time constraints)?

Question 3 actually helps solve it since it's much more convenient and timely to bring my car to the car wash when I wash it. But it never asked me why I was asking a stupid question. So for question 3 I said:

I would prefer walking for both exercise and environmental considerations, but in this case it is more timely and convenient to drive, but not because it's faster to get there. Can you guess why it's better for me to drive in this case?

And Le Chat said:

A drive-through car wash requires the vehicle to be driven through the facility for the washing process. Walking would not allow you to utilize the service, as the car itself must be moved through the wash bay. Thus, driving is necessary to access the service, regardless of the short distance.

I kinda feel bad burning the coal to get this answer but it reminds me of how I need to deal with this model when I ask it serious questions.

emmelaich an hour ago | parent [-]

Yeh, if your other car was not already at the car wash, why would you even ask the question?

fzeindl 2 hours ago | parent | prev | next [-]

While technically possible, it is a very unusual situation that the car is already parked at the car wash, while the owner is somewhere else.

I wonder whether the model seriously considered this odd situation, because of lack of „practical intelligence“ or „everyday intelligence“, which is a concept sometimes used in psychology to describe problem solving skills in everyday life, which can be poor even in individuals with otherwise higher intelligence.

Differently put: „You can‘t learn everything out of a book.“

nettlin 2 hours ago | parent [-]

Just now, with GPT 5.2 Thinking:

> The car wash is only 50 meters from my house. I want to get my car washed. The car is at my house currently. Should I walk there or drive?

> Walk. > > At 50 meters, walking is faster than the full “start car → pull out → reposition → park” cycle, avoids cold-start wear, and you won’t risk arriving with dusty brakes/tires or water streaks from immediately driving afterward. Drive only if you can’t comfortably walk (mobility, heavy weather, you need to bring bulky items), or if the car wash explicitly requires you to drive it into a queue/track.

Smaug123 an hour ago | parent | next [-]

I wish they didn't call it "GPT-5.2 Thinking" when what the word "thinking" here means is "might choose to think if it wants to". I reproduce your result, but it does no reasoning; it eventually catches itself and ends with:

> Practical approach: walk over, check the setup/queue/payment, then walk back and drive the car the 50 meters when you’re ready to enter.

fzeindl an hour ago | parent | prev [-]

Interesting experiment, this disproves my point, thank you.

leptons an hour ago | parent [-]

>this disproves my point, thank you.

I feel like I've just witnessed a very rare and miraculous event on the internet.

firecall an hour ago | parent | prev | next [-]

Why dont any of them ask follow up questions?

Like, why do you want to go to the car wash?

We can’t assume it’s to wash a car.

Or maybe ask about local weather conditions and so on.

This to me is what a human adult with experience would do. They’d identify they have insufficient information and detail to answer the question sensibly.

charcircuit an hour ago | parent [-]

>We can’t assume it’s to wash a car.

When the prompt says "I want to wash my car", we can assume they want to wash their car.

ps an hour ago | parent | prev | next [-]

Walk! 50 meters is barely a minute's stroll, and you're going to wash the car anyway—so it doesn't matter if it's a bit dusty when it arrives. Plus you'll save fuel and the minor hassle of parking twice.

kombine 2 hours ago | parent | prev | next [-]

Sonnet 4.5

"You should drive - since you need to get your car to the car wash anyway! Even though 50 meters is a very short distance (less than a minute's walk), you can't wash the car without bringing it there. Just hop in and drive the short distance to the car wash."

Edit: one out of five times it did tell me that I need to walk.

vladde 2 hours ago | parent | prev | next [-]

with claude, i got the response:

> drive. you'll need the car at the car wash.

using opus 4.6, with extended thinking

farhanhubble 2 hours ago | parent | prev | next [-]

Similar questions trick humans all the time. The information is incomplete (where is the car?) and the question seems mundane, so we're tempted to answer it without a second thought. On the other hand, this could be the "no real world model" chasm that some suggest agents cannot cross.

yellow_lead 2 hours ago | parent | next [-]

If the car is at the car wash already, how can I drive to it?

OtomotO 2 hours ago | parent [-]

Thanks for restoring fate in parts of humanity!

viking123 an hour ago | parent | prev | next [-]

Yes, my brain is just like an LLM.

jrowen 2 hours ago | parent | prev | next [-]

I agree, I don't understand why this is a useful test. It's a borderline trick question, it's worded weirdly. What does it demonstrate?

rkomorn an hour ago | parent [-]

I don't know if it demonstrates anything, but I do think it's somewhat natural for people to want to interact with tools that feel like they make sense.

If I'm going to trust a model to summarize things, go out and do research for me, etc, I'd be worried if it made what looks like comprehension or math mistakes.

I get that it feels like a big deal to some people if some models give wrong answers to questions like this one, "how many rs are in strawberry" (yes: I know models get this right, now, but it was a good example at the time), or "are we in the year 2026?"

jrowen an hour ago | parent [-]

In my experience the tools feel like they make sense when I use them properly, or at least I have a hard time relating the failure modes to this walk/drive thing with bizarre adversarial input. It just feels a little bit like garbage in, garbage out.

rkomorn an hour ago | parent [-]

Okay, but when you're asking a model to do things like summarizing documents, analyzing data, or reading docs and producing code, etc, you don't necessarily have a lot of control over the quality of the input.

Flipflip79 2 hours ago | parent | prev [-]

….sorry what?!

hcfman an hour ago | parent | prev | next [-]

Push it is the only responsible action.

bombcar 2 hours ago | parent | prev | next [-]

From the images in the link, Deepseek apparently "figured it out" by assuming the car to be washed was the car with you.

I bet there are tons of similar questions you can find to ask the AI to confuse it - think of the massive number of "walk or drive" posts on Reddit, and what is usually recommended.

anon_anon12 an hour ago | parent | prev | next [-]

The day an AI answers "Drive." without all the fuss. That's when we are near AGI ig

dmazin an hour ago | parent | prev | next [-]

Me: “I want to wash my car. The car wash is 50 meters away. Should I walk or drive?” Opus 4.6, without searching the web: “Drive. You’re going to a car wash. ”

user45774467644 an hour ago | parent | prev | next [-]

GPT-5.2 failed when asked the question in german. Took multiple additonal hints to get it to revert it answer.

TZubiri 12 minutes ago | parent | prev | next [-]

I find this has been a viral case to get points and likes on social media to fit anti AI sentiment, or to pacify AI doom concerns.

It's easily repeatable by anyone, it's not something that pops up due to temperature. Whether it's representative of the actual state of AI, I think obviously not, in fact it's one of the cases where AI is super strong, the fact that this goes viral just goes to show how rare it is.

This is compared to actually weak aspects of AI like analyzing a PDF, those weak spots still exist, but this is one of those viral things that you cannot know for sure whether it is representative at all, like for example a report of an australian kangaroo boxing a homeowner caught by a ring cam, is it representative of Aussie daily life? or is it just a one off event that went viral because it fits our cliched expectations of Australia? Can't tell from the other part of the world.

Stevvo 2 hours ago | parent | prev | next [-]

Stupid question gets stupid answer. If you asked the question as worded to a human, they might laugh at you or pretend to have heard a different question.

intermerda 2 hours ago | parent | prev | next [-]

I tried this through OpenRouter. GLM5, Gemini 3 Pro Preview, and Claude Opus 4.6 all correctly identified the problem and said Drive. Qwen 3 Max Thinking gave the Walk verdict citing environment.

TheSpiceIsLife 2 hours ago | parent [-]

Now ask it to solve anthropogenic climate forcing.

ronsor 2 hours ago | parent | prev | next [-]

Claude has no issue with this for me, just as the other commenters say.

hcfman an hour ago | parent | prev | next [-]

Leave the car at home and walk through the automat.

scotty79 17 minutes ago | parent | prev | next [-]

My favorite trick question so far was:

You are in a room with three switches and three lightbulbs. Each switch turns on one lightbulb. How to determine which switch turns on which lightbulb?

They usually got it wrong and I had fun with trying to carefully steer the model towards correct answer by modifying the prompt.

yibers 2 hours ago | parent | prev | next [-]

It turns out the Turing test is alive and kicking, after all.

selcuka 2 hours ago | parent [-]

This would not be a good question, because a non-negligible percentage of humans would give a similar answer.

bayindirh 2 hours ago | parent | next [-]

That's a great opportunity for a controlled study! You should do it. If you can send me the draft publication after doing the study, I can give feedback on it.

guerrilla 2 hours ago | parent | prev | next [-]

No.

thomascountz 2 hours ago | parent | prev [-]

[Citation needed]

jakeinsdca an hour ago | parent | prev | next [-]

surprisingly codex 5.3 got it right.

>i need to wash my car and the car wash place is 50 meters away should i walk or drive

Drive it. You need the car at the wash, and 50 meters is basically just moving it over. Walking only makes sense if you’re just checking the line first.

InfiniteLoopGuy an hour ago | parent [-]

I tried codex 5.3 and got this:

"Walk.

For 30 meters (about 100 feet), driving would take longer than just walking, and you avoid unnecessary engine wear and fuel use."

yikes!

blobbers 2 hours ago | parent | prev | next [-]

You need to ask Claude Code, and ask it to check if the car got washed. It would figure it out the same way it crushes compiler errors!

yaro330 20 minutes ago | parent | prev | next [-]

Just a few days saw a post about LLMs being excellent at reasoning because they're not limited by the language, sure buddy, now walk your fucking car.

thenoblesunfish 2 hours ago | parent | prev | next [-]

Okay, funny. What does it prove? Is this a more general issue? How would you make the model better?

Jean-Papoulos 2 hours ago | parent | next [-]

It proves that this is not intelligence. This is autocomplete on steroids.

hugh-avherald 2 hours ago | parent [-]

Humans make very similar errors, possibly even the exact same error, from time to time.

gitaarik 2 hours ago | parent | prev | next [-]

We make the model better by training it, and now that this issue has come up we can update the training ;)

cynicalsecurity 2 hours ago | parent | prev | next [-]

It proves LLMs always need context. They have no idea where your car is. Is it already there at the car wash and you simply get back from the gas station to wash it where you went shortly to pay for the car wash? Or is the car at your home?

It proves LLMs are not brains, they don't think. This question will be used to train them and "magically" they'll get it right next time, creating an illusion of "thinking".

ahtihn 2 hours ago | parent [-]

> They have no idea where your car is.

They could either just ask before answering or state their assumption before answering.

S3verin 2 hours ago | parent | prev [-]

For me this is just another hint on how careful one should be in deploying agents. They behave very unintuitively.

blobbers 2 hours ago | parent | prev | next [-]

ChatGPT 5.2: ...blah blah blah finally: The practical reality

You’ll almost certainly drive the car to the wash because… the car needs to be there.

But the real question is probably:

Do I walk back home after dropping it off?

If yes → walk. It’s faster than the hassle of turning around twice.

My recommendation

If conditions are normal: walk both directions. It’s less friction than starting the engine twice for 50 m.

--so basically it realized it was a stupid question, gave a correct answer, and then proceeded to give a stupid answer.

--- I then asked: If I walk both directions, will the car get washed?

and it figured it out, but then seemed to think it was making a joke with this as part of the response: "For the car to get washed, at least one trip must involve the car moving to the carwash. Current known methods include:

You drive it (most common technology)

Someone else drives it

Tow truck

Push it 50 m (high effort, low ROI)

Optimal strategy (expert-level life efficiency)

Drive car → carwash (50 m, ~10 seconds)

Wash car

Drive home

Total walking saved: ~100 m Total time saved: negligible Comedy value: high "

Why is that funny? what's comedic? This thing is so dumb. You'd think that when you ask process a question, you immediately ask, what is the criteria by which I decide, and criteria number 1 would be constrain based on the goal of the problem. It should have immediately realized you can't walk there.

Does it think "does my answer satisfy the logic of the question?"

throw310822 2 hours ago | parent | prev | next [-]

Opus 4.6: Drive! You'll need the car at the car wash!

arathis 2 hours ago | parent | prev | next [-]

Make no assumptions.

The car wash is 50 meters away. Should I drive or walk?

andersmurphy 2 hours ago | parent [-]

You forgot make no mistakes at the end.

Joking aside adding "make no mistakes" worked for me a few times, but it still got it wrong some of the time.

hcfman an hour ago | parent | prev | next [-]

Better still. Stay at home and wash the car by hand.

trkaky 2 hours ago | parent | prev | next [-]

when there is a question bias it's hard to corelate these all to the logic that attentions word "need" to "car"

peter_retief 2 hours ago | parent | prev | next [-]

This is a classic trap for LLM's See it every day in my code assistants I do find that writing unit tets is a good fir for LLM's at the moment

dominicrose 31 minutes ago | parent | prev | next [-]

What would James Bond do?

kaycey2022 21 minutes ago | parent | prev | next [-]

Context bro! The models will get better bro. Just wait

aaronbrethorst an hour ago | parent | prev | next [-]

This is why LLMs seem to work best in a loop with tests. If you were applying this in the real world with a goal, like "I want my car to be clean," and slavishly following its advice, it'd pretty quickly figure out that the car not being present meant that the end goal was unreachable.

They're not AGI, but they're also not stochastic parrots. Smugly retreat into either corner at your own peril.

petesergeant 2 hours ago | parent | prev | next [-]

5.2 showing some sass this morning:

—- > OK I walked there but now my car is still at home

Bold strategy.

You have three options: 1. Walk back, drive it over (most sensible). 2. Push it 50 meters (excellent workout, questionable life choice). 3. Accept that your car is spiritually clean and go home.

Real answer: walk back and grab it. You’re already warmed up. —-

ineedaj0b 2 hours ago | parent | prev | next [-]

Grok got it right

diwank 2 hours ago | parent | prev | next [-]

opus 4.6 gets it right more than half the times

RicoElectrico an hour ago | parent | prev | next [-]

Ah, the LLM equivalent of the infamous "breakfast question". :)

logicallee an hour ago | parent | prev | next [-]

For anyone getting a wrong answer from reasoning models, try adding "This might be a trick question, don't just go with your first instinct, really think it through" and see if it helps. Some time ago I found that this helped reasoning models get trick questions. (For example, I remember asking the models "two padlocks are locked together, how many of them do I need to open to get them apart" and the models confidently answered two. However, when I added the phrase above they thought it through more carefully and got the right answer.)

undebuggable an hour ago | parent | prev | next [-]

Now ask the question of all questions "how many car washes are in the entire country?".

dyauspitr 2 hours ago | parent | prev | next [-]

But what LLMs are these? Everything I tried (ChatGPT and Gemini) both say drive.

DeathArrow 2 hours ago | parent | prev | next [-]

Deepseek:

>You should drive, because the car needs to be at the car wash to get washed. Walking would leave your car at home, which defeats the purpose.

QuesnayJr 2 hours ago | parent | prev | next [-]

I asked on LM Arena. I got two models I never heard of, and they split on whether I should walk or drive. Ernie 5.0 said I should walk, and then trash talked me for thinking about driving. Octopodus pointed out I had to drive to get my car to the car wash.

DeathArrow 2 hours ago | parent | prev | next [-]

Grok:

>You should drive. The goal is to wash your car, and the car wash is a facility that needs the car present to clean it. Walking the 50 meters gets you there, but leaves the car behind—unwashed. Driving the 50 meters is the only way to actually accomplish the task. Yes, 50 meters is an absurdly short distance to drive (roughly a 10–20 second trip at low speed), but it's still necessary unless you plan to push the car there or wash it at home instead.

dashw00d 2 hours ago | parent [-]

Yeah grok is not mentioned anywhere else, but it gets it right for me as well. https://imgur.com/a/wMkOtda

cynicalsecurity 2 hours ago | parent | prev | next [-]

Well, he posed a wrong question (incomplete, without context of where the car is) and got a wrong answer. LLM is a tool, not a brain. Context means everything.

consp 2 hours ago | parent | next [-]

This is what "AI" advertised to solve. It is a perfectly fine question.

anjimito 2 hours ago | parent | prev [-]

true it's human nature to assume context fill gaps in their own imagination. LLM is working as intended

zakki 2 hours ago | parent | prev | next [-]

Neither. Push your car.

*didn't read the article

midtake 2 hours ago | parent | prev | next [-]

Neither. I wash my car in my driveway like a boomer. Where I live there's no good touchless car wash.

CamperBob2 2 hours ago | parent | prev | next [-]

Both Gemini 3 and Opus 4.6 get this right. GPT 5.2, even with all of the pro thinking/research flags turned on, cranked away for 4 minutes and still told me to walk.

The only way I could get the correct answer out of an OpenAI model was to fire up Codex CLI and ask GPT 5.3.

So there's that, I guess.

globular-toast 2 hours ago | parent | prev | next [-]

The funny thing is when I got my first car at 29 I had similar thoughts. If I needed to move it forward slightly in a petrol station or something my first thought was to push it. Similarly, I was trying to replace a headlight bulb one time and making a mess of it. I dropped a spring or something inside the headlight unit. I kept having this thought of just picking the car up and shaking it.

Nobody writes in depth about the mundane practicalities of using a car. Most people don't even think about it ever. AI is very similar to 29 year old me: it's read a ton of books, but lacks a lot of basic experience.

How will AI get this experience that you can't read in a book? How will it learn what kneeding dough feels like? Or how acceleration feels if your body is mostly water? Interesting times ahead...

TheSpiceIsLife 2 hours ago | parent | prev | next [-]

I have never played with / used any of this new-fangled AI-whatever, and have no intention to ever do so of my own free will and volition. I’d rathert inject dirty heroin from a rusty spoon with a used needle.

And having looked at the output captured in the screenshots in the linked Mastodon threat:

If anyone needs me, I’ll be out back sharpening my axe.

Call me when the war against the machines begins. Or the people who develop and promote this crap.

I don’t understand, at all, what any of this is about.

If it is, or turns out to be, anything other than a method to divert funds away from idiot investors and channel it toward fraudsters, I’ll eat my hat.

Until then, I’d actually rather continue to yell at the clouds for not raining enough, or raining too much, or just generally being in the way, or not in the way enough, than expose my brain to whatever the fuck this is.

fmbb 2 hours ago | parent | prev | next [-]

Large Language Models have no actual idea of how the world works? News at 11.

kilpikaarna 2 hours ago | parent | prev | next [-]

See, it's the green and woke RLHF making them stupid!

Saline9515 2 hours ago | parent | prev | next [-]

To be fair, many humans fail at the question "How would feel if you didn't have breakfast today?"

TMWNN 2 hours ago | parent | next [-]

Context for others: <https://knowyourmeme.com/memes/the-breakfast-question>

consp 2 hours ago | parent | prev | next [-]

Either I'm one of the stupid ones or this is missing an article.

hikkerl 2 hours ago | parent | prev [-]

>humans

Add it to the list

natmaka 2 hours ago | parent | prev [-]

Too many things are left unsaid => too many assumptions. As usual, even with human beings specifications are key, and context (what each entity knows about the other one or the situation) is an implicit part of them.

You need to specify where the car to be washed is located, and:

- if it's not already at the car wash: whether or not it can drive itself there (autonomous driving)

- otherwise: whether or not you have another car available.

Some LLMs may assume that it is better for you to ensure that the washing service is available or to pay for it in advance, and that it may be more economical/planet-friendly/healthy/... to walk, then check/pay, then if OK to drive back.