Remix.run Logo
rglullis 6 hours ago

Oh, can we do Civilization next?

4 hours ago | parent | next [-]
[deleted]
rkozik1989 4 hours ago | parent | prev [-]

You do know we're hemorrhaging and lot of finite resources to play these games badly, right? We're basically at laying on chaise lounge being fed grapes levels of hedonism. Make me a racist meme that copyright infringes multiple IP holders and when you're done play Sim City at competency level of a blind man.

staticshock 3 hours ago | parent [-]

I think the way to see this as the organic process of discovering hard-to-game benchmarks. The loop is:

1. People discover things LLMs can kind of do, but very poorly.

2. Frontier labs sample these discoveries and incorporate them into benchmarks to monitor internally.

3. Next generation model improves on said benchmarks, and the improvements generalize to improvements on loosely correlated real world tasks.