Remix.run Logo
dist-epoch 3 days ago

it's not strictly a counting task, the LLM sees same-sized-tokens, but a token corresponds to a variable number of characters (which is not directly fed into the model)

like the difference between Unicode code-points and UTF-8 bytes, you can't just count UTF-8 bytes to know how many code-points you have

omnicognate 3 days ago | parent [-]

There's an aspect of figuring out what to count, but that doesn't make this task visual/spatial in any sense I can make out.