The problem is the non deterministic part
An LLM can generate many valid reasons for doing X, including the right one, but you can't know which one of those it will give you