Remix.run Logo
Show HN: Quicktok, an exact BPE tokenizer 7x faster than tiktoken(github.com)
2 points by dmatth1 8 hours ago | 1 comments
dmatth1 8 hours ago | parent [-]

quicktok runs the same algorithm as bpe-openai (exact backtracking BPE) but applies lots of data-structure optimizations to cut memory accesses and achieve the speedups (~7x over tiktoken). Output is byte-identical to tiktoken so this can be a great drop-in for anyone doing lots of corpus ingestion, search indexing etc.

Happy to answer all questions. If you find any input where quicktok's ids differ from tiktoken's that's a bug! Please report it.