| ▲ | cjsaltlake 5 hours ago | |
code clash I think would be quite hard to game or contaminate unintentionally; considering that models need to compete against one another | ||
| ▲ | gertlabs 4 hours ago | parent | next [-] | |
https://gertlabs.com already does this at scale. An industry-standard benchmark shouldn't be hosted or designed by a lab producing the models, regardless. | ||
| ▲ | Bombthecat 5 hours ago | parent | prev [-] | |
I mean the data / benchmarks | ||