Remix.run Logo
amelius 2 hours ago

There should be a way to turn the questions we ask LLMs into benchmarks.

That way, we can have a benchmark that is always up to date.