Remix.run Logo
mkaszkowiak 3 hours ago

What was your approach to benchmarking an adversarial agent?

This is an open problem that I came across (in a different domain), as the search space can be really wide. It's hard to measure results for non-trivial tasks.

Would be really interested if you can share your eval approach :)