You seem to be going off the title which is plainly incorrect and not what the paper says. The paper demonstrates HOW different models can learn similar representations due to "data, architecture, optimizer, and tokenizer".

"How Different Language Models Learn Similar Number Representations" (actual title) is distinctly different from "Different Language Models Learn Similar Number Representations" - the latter implying some immutable law of the universe.

▲

dnautics 6 hours ago | parent | next [-]

> latter implying some immutable law of the universe

I think the implications is slightly weaker -- it implies some immutable law of training datasets?

▲

NooneAtAll3 3 hours ago | parent | prev [-]

I don't understand your argument

"How X happens" still implies that X happens, just adds additional explanation on top

	▲	causal 2 hours ago \| parent [-]
		"How" = it can happen Without "How" = it will happen