Remix.run Logo
gunalx 4 hours ago

Does it support multimodal documents?

My main gripe with openwebui, in addition to it being slow is the fact that it mangles documents in the OCR step. tables that could have been understood great by an multi modal llm, just gets mangled by the ocr and lost instead of storing both a text and original representation.

Being able to properly searcbin the knowlege base lime the llm does, but manually would be nice (like get recommendations for docs to add).

My usecase is mostly writing, so having a integrated document refinery editor is also a nice feature list.

I'm probably rambling but these are my base use-cases for a llm ui I personally have found.

Weves 3 hours ago | parent [-]

What format are the docs being uploaded as? By default, images uploaded into the chat would be directly passed through. PDFs would be parsed and fed to the LLM as text.

Writing is a really common use case, and something we'd like to explore more. Currently people often use Onyx for "write something combining X, Y, and Z documents", but I feel that's just scratching the surface.

gunalx 2 hours ago | parent [-]

I was mostly ranting about open-webui and hoping onyx would be better than the current state. My usecase involves pdfs with lots of complex figures, ocrd through mistral ocr witch gives text, and images for figures (have tried multiple other as well). I would really like to keep the figures as images, as ocr captions really struggles getting the full semantic meaning.

But stoked to get alternatives to the area, will try it out once i get time soon.