I don't think the comparison to humans works. It is as if you expect that we can easily train many different LLMs to solve the originality problem, but that is far from guaranteed.