▲ | groby_b 7 days ago | ||||||||||||||||||||||
That's not what [1] says, though? Quoth: "As of March 1, 2023, data sent to the OpenAI API will not be used to train or improve OpenAI models (unless you explicitly opt-in to share data with us, such as by providing feedback in the Playground). " "Traditional methods (PDF parsers with OCR support) are cheaper, more reliable" Not sure on the reliability - the ones I'm using all fail at structured data. You want a table extracted from a PDF, LLMs are your friend. (Recommendations welcome) | |||||||||||||||||||||||
▲ | niklasd 7 days ago | parent | next [-] | ||||||||||||||||||||||
We found that for extracting tables, OpenAIs LLMs aren't great. What is working well for us is Docling (https://github.com/DS4SD/docling/) | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | emmanueloga_ 7 days ago | parent | prev [-] | ||||||||||||||||||||||
> That's not what [1] says, though? Documind is using https://api.openai.com/v1/chat/completions, check the docs at the end of the long API table [1]: > * Chat Completions: > Image inputs via the gpt-4o, gpt-4o-mini, chatgpt-4o-latest, or gpt-4-turbo models (or previously gpt-4-vision-preview) are not eligible for zero retention." -- 1: https://platform.openai.com/docs/models#how-we-use-your-data | |||||||||||||||||||||||
|