Could it be a result of a caching of some sort? I suppose in case of LLM they can't make a direct cache but they could group prompts using embeddings and produce some most common result maybe? (this is just a theory)