▲ | infecto 7 days ago | |||||||||||||||||||||||||
Multimodal LLM are not the way to do this for a business workflow yet. In my experience your much better of starting with a Azure Doc Intelligence or AWS Textract to first get the structure of the document (PDF). These tools are incredibly robust and do a great job with most of the common cases you can throw at it. From there you can use an LLM to interrogate and structure the data to your hearts delight. | ||||||||||||||||||||||||||
▲ | disgruntledphd2 7 days ago | parent | next [-] | |||||||||||||||||||||||||
> AWS Textract to first get the structure of the document (PDF). These tools are incredibly robust and do a great job with most of the common cases you can throw at it. Do they work for Bills of Lading yet? When I tested a sample of these bills a few years back (2022 I think), the results were not good at all. But I honestly wouldn't be surprised if they'd massively improved lately. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
▲ | IndieCoder 7 days ago | parent | prev [-] | |||||||||||||||||||||||||
Plus one, using the exact setup to make it scale. If Azure Doc Intelligence gets too expensive, VLMs also work great | ||||||||||||||||||||||||||
|