| ▲ | xodn348 5 hours ago | |
Really interesting approach — attacking token efficiency at the encoding level is more fundamental than what I did. Even without retraining BPE from scratch, starting with YUTF-8 and measuring how existing tokenizers handle it would already be a worthwhile experiment. Hope you find the time to build it, good luck! | ||