| ▲ | zahlman 2 days ago | |
> This one is bizarre, if true (I'm not convinced it is). > The entire purpose of the attention mechanism in the transformer architecture is to build this representation, in many layers (conceptually: in many layers of abstraction). I think this is really about a hidden (i.e. not readily communicated) difference in what the word "meaning" means to different people. | ||
| ▲ | erichocean 2 days ago | parent [-] | |
Could be, by "meaning" I mean (heh) that transformers are able to distinguish tokens (and prompts) in a consequential ("causal") way, and that they do so at various levels of detail ("abstractions"). I think that's the usual understanding of how transformer architectures work, at the level of math. | ||