| ▲ | rcxdude 3 hours ago | |
Even if you could do this rigorously (not at all obvious with how LLMs work), it's not a reliable metric: you can easily fabricate debate as well, and in this case the main issue was essentially skimming the surface of the reports and not looking any deeper to see the obvious red flags that it was an april-fools-level fake (which obviously even a person can fall for, but LLMs are being given a far greater level of trust for some reason) | ||