▲ | anuramat 5 days ago | |
Literally every single one? To not mess it up, they either have to spell the word l-i-k-e t-h-i-s in the output/CoT first (which depends on the tokenizer counting every letter as a separate token), or have the exact question in the training set, and all of that is assuming that the model can spell every token. Sure, it's not exactly a fair setting, but it's a decent reminder about the limitations of the framework |