Remix.run Logo
cjsaltlake 5 hours ago

code clash I think would be quite hard to game or contaminate unintentionally; considering that models need to compete against one another

gertlabs 4 hours ago | parent | next [-]

https://gertlabs.com already does this at scale.

An industry-standard benchmark shouldn't be hosted or designed by a lab producing the models, regardless.

Bombthecat 5 hours ago | parent | prev [-]

I mean the data / benchmarks