| ▲ | zahlman 3 hours ago | |
> In the training data, different units are going to share near-identical grammatical roles and positions in sentences. Yes, but I would also expect the training data to include tons of examples of students doing unit-conversion homework, resources explaining the concept, etc. (So I would expect the embedding space to naturally include dimensions that represent some kind of metric-system-ness, because of data talking about the metric system.) And I understand the LLMs can somehow do arithmetic reasonably well (though it matters for some reason how big the numbers are, so presumably the internal logic is rather different from textbook algorithms), even without tool use. | ||