Remix.run Logo
energy123 3 days ago

Computer vision went through this 2 decades ago. You need to perturb the input data. Same thing may need to be done in RL pipelines.

Someone should make a new public benchmark called GPQA-Perturbed. Give the providers something to benchmaxx towards.