| ▲ | fragmede 2 hours ago | |
The canonical example I use is how good are (philosophical) you at programming on a whiteboard given one shot and no tools? Vs at your computer given access to everything? So judging LLMs on that rubric seems as dumb as judging humans by that rubric. | ||