Remix.run Logo
MattRogish 2 hours ago

I calculate* the appropriate overlap and the slicer overlaps a certain amount of the previous slice. There is some post-processing assembly required, but it's trivial.

[*] SWAG line height, trial and error to figure out the right amount of overlap given LLM error rates, etc.

ryanisnan 2 hours ago | parent [-]

Interesting. Do you have a uniform data set? E.g. documents of a specific type that you know consistently have similar formats, or is this training something you need to do per-document?