| ▲ | MattRogish 2 hours ago | |
I calculate* the appropriate overlap and the slicer overlaps a certain amount of the previous slice. There is some post-processing assembly required, but it's trivial. [*] SWAG line height, trial and error to figure out the right amount of overlap given LLM error rates, etc. | ||
| ▲ | ryanisnan 2 hours ago | parent [-] | |
Interesting. Do you have a uniform data set? E.g. documents of a specific type that you know consistently have similar formats, or is this training something you need to do per-document? | ||