How hard can it be to create a universal "correctness" checker? Pretty damn hard!
Our notion of "correct" for most things is basically derived from a very long training run on reality with the loss function being for how long a gene propagated.