▲ | arvindh-manian 6 days ago | |||||||
For reference, I think a common approximation is one token being 0.75 words. For a 100 page book, that translates to around 50,000 tokens. For 1 mil+ tokens, we need to be looking at 2000+ page books. That's pretty rare, even for documentation. It doesn't have to be text-based, though. I could see films and TV shows becoming increasingly important for long-context model training. | ||||||||
▲ | handfuloflight 6 days ago | parent [-] | |||||||
What about the role of synthetic data? | ||||||||
|