Remix.run Logo
barrell 6 hours ago

Yes. I spent about 3 days trying to optimize the prompt to get gpt-5 to not produce gibberish, to no avail. Completions took several minutes, had an above 50% timeout rate (with a 6 minute timeout mind you), and after retrying they still would return gibberish about 15% of the time (12% on one task, 20% on another task).

I then tried multiple models, and they all failed in spectacular ways. Only Grok and Mistral had an acceptable success rate, although Grok did not follow the formatting instructions as well as Mistral.

Phrasing is a language learning application, so the formatting is very complicated, with multiple languages and multiple scripts intertwined with markdown formatting. I do include dozens of examples in the prompts, but it's something many models struggle with.

This was a few months ago, so to be fair, it's possible gpt-5.1 or gemini-3 or the new deepseek model may have caught up. I have not had the time or need to compare, as Mistral has been sufficient for my use cases.

I mean, I'd love to get that 0.1% error rate down, but there have always more pressing issues XD

data-ottawa 6 hours ago | parent | next [-]

With gpt5 did you try adjusting the reasoning level to "minimal"?

I tried using it for a very small and quick summarization task that needed low latency and any level above that took several seconds to get a response. Using minimal brought that down significantly.

Weirdly gpt5's reasoning levels don't map to the OpenAI api level reasoning effort levels.

barrell 5 hours ago | parent [-]

Reasoning was set to minimal and low (and I think I tried medium at some point). I do not believe the timeouts were due to the reasoning taking to long, although I never streamed the results. I think the model just fails often. It stops producing tokens and eventually the request times out.

barbazoo 6 hours ago | parent | prev [-]

Hard to gauge what gibberish is without an example of the data and what you prompted the LLM with.

barrell 6 hours ago | parent [-]

If you wanted examples, you needed only ask :)

These are screenshots from that week: https://x.com/barrelltech/status/1995900100174880806

I'm not going to share the prompt because (1) it's very long (2) there were dozens of variations and (3) it seems like poor business practices to share the most indefensible part of your business online XD

barbazoo 5 hours ago | parent | next [-]

Surely reads like someone's brain transformed into a tree :)

Impressive, I haven't seen that myself yet, I've only used 5 conversationally, not via API yet.

barrell 5 hours ago | parent [-]

Heh it's a quote from Archer FX (and admittedly a poor machine translation, it's a very old expression of mine).

And yes, this only happens when I ask it to apply my formatting rules. If you let GPT format itself, I would be surprised if this ever happens.

sandblast 6 hours ago | parent | prev [-]

XD XD