The question is not stupid, it might be banal, but so is "what is 2+2". It shows the limitations of LLMs, in this specific case how they lose track of which object is which.