▲ | HarHarVeryFunny 6 days ago | ||||||||||||||||||||||
Is it really necessary to split it into pages? Not so bad if you automate it I suppose, but aren't there models that will accept a large PDF directly (I know Sonnet has a 32MB limit)? | |||||||||||||||||||||||
▲ | 7thpower 6 days ago | parent | next [-] | ||||||||||||||||||||||
They are limited on how much they can output and there is generally an inverse relationship between the amount of tokens you send vs quality after the first 20-30 thousand tokens. | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | therealpygon 5 days ago | parent | prev | next [-] | ||||||||||||||||||||||
Necessary? No. Better? Probably. Despite larger context windows, attention and hallucinations aren’t completely a thing of the past within the expanded context windows today. Splitting to individual pages likely helps ensure that you stay well within a normal context window size that seems to avoid most of these issues. Asking an LLM to maintain attention for a single page is much more achievable than an entire book. Also, PDF size isn’t a relevant measurement of token lengths when it comes to PDFs which can range from a collection of high quality JPEG images to thousand(s) of pages of text | |||||||||||||||||||||||
▲ | siva7 6 days ago | parent | prev [-] | ||||||||||||||||||||||
They all accept large PDFs (or any kind of input) but the quality of the output will suffer for various reasons. |