That makes sense; however it does not seem like they check the LLM outputs against the known solution. Maybe I missed that in the article.