| ▲ | yorwba 6 days ago | |
Anthropic was already special-casing case-folding in their tokenizers before this recent change: https://transformer-circuits.pub/2025/attribution-graphs/met... "The tokenizer the model was trained with uses a special “Caps Lock” token" (⇪). Their visualizations for Claude 3.5 Haiku also show the Title Case token (↑). This is similar to what the TokenMonster tokenizer does: https://github.com/alasdairforsythe/tokenmonster | ||