| ▲ | Retr0id 3 hours ago | |
Could you elaborate on "token selection is based off normalization"? | ||
| ▲ | Ucalegon 2 hours ago | parent [-] | |
Sure; https://arxiv.org/pdf/1607.06450 Depending on the model architecture, there is normalization taking place in multiple different places in order to save compute and ensure (some) consistency in output. Training, by its very nature, also is a normalization function, since you are telling the model which outputs are and are not valid, shaping weights that define features. | ||