Remix.run Logo
akoboldfrying 2 hours ago

The tokenisation needs to be general -- it needs to be able to encode any possible input. It should also be at least moderately efficient across the distribution of inputs that it will tend to see. Existing tokenisation schemes explicitly target this.