> seem to be fine
Now repeat the question to the same model in different contexts several times and count what percentage of the time it’s correct.