There are lots of benchmarks to compare the absolute values of different models on the same scale (as opposed to vibes (my apologies for the shorthand), etc.).