If a stranger asks me, "Should I walk or drive to this car wash?" then I assume they're asking in good faith and both options are reasonable for their situation. So it's a safe assumption that they're not going there to get their car washed. Maybe they're starting work there tomorrow, for example, and don't know how pedestrian-friendly the route is.

Is the goal behind evaluating models this way to incentivize training them to assume we're bad-faith tricksters even when asking benign questions like how best to traverse a particular 100m? I can't imagine why it would be desirable to optimize for that outcome.

(I'm not saying that's your goal personally - I mean the goal behind the test itself, which I'd heard of before this thread. Seems like a bad test.)

▲

zamalek 12 hours ago | parent [-]

> I need to get my car washed; should I drive or walk to the car wash that is 100m away?

> Walking 100 m is generally faster, cheaper, and better for the environment than driving such a short distance. If you have a car that’s already running and you don’t mind a few extra seconds, walking also avoids the hassle of finding parking or worrying about traffic.

	▲	rtfeldman 9 hours ago \| parent [-]
		That's a much better test!