Which benchmarks are not garbage?
I don't consider myself super special. I think it should be doable to create a benchmark that beats me having to test every single new model.