| ▲ | Show HN: Quicktok, an exact BPE tokenizer 7x faster than tiktoken(github.com) | |
| 2 points by dmatth1 8 hours ago | 1 comments | ||
| ▲ | dmatth1 8 hours ago | parent [-] | |
quicktok runs the same algorithm as bpe-openai (exact backtracking BPE) but applies lots of data-structure optimizations to cut memory accesses and achieve the speedups (~7x over tiktoken). Output is byte-identical to tiktoken so this can be a great drop-in for anyone doing lots of corpus ingestion, search indexing etc. Happy to answer all questions. If you find any input where quicktok's ids differ from tiktoken's that's a bug! Please report it. | ||