There's also a lot of analogising of this to visual/spatial reasoning, even to the point of talking about "visual illusions", when its clearly a counting task as the title says.

It makes it tedious to figure out what they actually did (which sounds interesting) when it's couched in such terms and presented in such an LLMified style.

▲

dist-epoch 3 days ago | parent [-]

it's not strictly a counting task, the LLM sees same-sized-tokens, but a token corresponds to a variable number of characters (which is not directly fed into the model)

like the difference between Unicode code-points and UTF-8 bytes, you can't just count UTF-8 bytes to know how many code-points you have

	▲	omnicognate 3 days ago \| parent [-]
		There's an aspect of figuring out what to count, but that doesn't make this task visual/spatial in any sense I can make out.