| ▲ | JustFinishedBSG 6 hours ago | |
Interesting, it doesn't seem intuitive at all to me. My (wrong?) understanding was that there was a positive correlation between how "good" a tokenizer is in terms of compression and the downstream model performance. Guess not. | ||