Remix.run Logo
mpesce 12 hours ago

AI evals are becoming intractable for the same reasons that measuring human general intelligence has been intractable for 150 years: the constructs resist decomposition, the benchmarks saturate, and the goalposts move. The reasons are the same because the thing being measured is the same. The difficulty is the evidence.