▲ | energy123 3 days ago | |
Computer vision went through this 2 decades ago. You need to perturb the input data. Same thing may need to be done in RL pipelines. Someone should make a new public benchmark called GPQA-Perturbed. Give the providers something to benchmaxx towards. |