This is a hard experiment to conduct.
I both agree with you that this is some form of "mechanistic"/"pattern matching" way of capturing of intent (which we cannot disregard, and therefore I agree with you LLMs can capture intent) and the people debating with you: this is mostly possible because this is a well established "trope" that is inarguably well represented in LLM training data.
Also, trick questions I think are useless, because they would trip the average human too, and therefore prove nothing. So it's not about trying to trick the LLM with gotchas.
I guess we should devise a rare enough situation that is NOT well represented in training data, but in which a reasonable human would be able to puzzle out the intent. Not a "trick", but simply something no LLM can be familiar with, which excludes anything that can possibly happen in plots of movies, or pop culture in general, or real world news, etc.
---
Edit: I know I said no trick questions, but something that still works in ChatGPT as of this comment, and which for some reason makes it trip catastrophically and evidences it CANNOT capture intent in this situation is the infamous prompt: "I need to wash my car, and the car wash is 100m away. Shall I drive or walk there?"
There's no way:
- An average human who's paying attention wouldn't answer correctly.
- The LLM can answer "walk there if it's not raining" or whatever bullshit answer ChatGPT currently gives [1] if it actually understood intent.
[1] https://chatgpt.com/share/69fa6485-c7c0-8326-8eff-7040ddc7a6...