| ▲ | AIhumanbench 8 hours ago | |
aihumanbench.com | ||
| ▲ | rad-b 8 hours ago | parent [-] | |
Seems interesting but testing myself only yields my results? How would I compare the result to a frontier model, that part seems to be missing? Also, the tests seem to be heavily skewed in favor of what LLMs are good at. | ||