I believe everyone should run their own evals on their own tasks or use cases.
Shameless plug, but I made a simple app for anyone to create their own evals locally:
https://eval.16x.engineer/