| ▲ | MattRogish 2 hours ago | |
I do OCR of images, and that's exactly what I do. I take one big image and slice it into many smaller ones, and send those to the LLM. Perfect every time, unlike using the whole image which resulted in hot garbage. | ||
| ▲ | freefaler an hour ago | parent | next [-] | |
It works with relatively good scans, when there are bad/skewed scans and especially something with many label/value pairs, that aren't nicely tucked inside sentences, the more context you have, the more you can find the correct words and fix the errors. There is a whole class of tricky documents. A decent (if you ignore the marketing bias) post about this problem can be found here: | ||
| ▲ | ryanisnan 34 minutes ago | parent | prev [-] | |
How do you know where to slice an image? What if you slice an image mid-word? | ||