| ▲ | barrell 6 hours ago | |||||||||||||||||||||||||||||||
Yes. I spent about 3 days trying to optimize the prompt to get gpt-5 to not produce gibberish, to no avail. Completions took several minutes, had an above 50% timeout rate (with a 6 minute timeout mind you), and after retrying they still would return gibberish about 15% of the time (12% on one task, 20% on another task). I then tried multiple models, and they all failed in spectacular ways. Only Grok and Mistral had an acceptable success rate, although Grok did not follow the formatting instructions as well as Mistral. Phrasing is a language learning application, so the formatting is very complicated, with multiple languages and multiple scripts intertwined with markdown formatting. I do include dozens of examples in the prompts, but it's something many models struggle with. This was a few months ago, so to be fair, it's possible gpt-5.1 or gemini-3 or the new deepseek model may have caught up. I have not had the time or need to compare, as Mistral has been sufficient for my use cases. I mean, I'd love to get that 0.1% error rate down, but there have always more pressing issues XD | ||||||||||||||||||||||||||||||||
| ▲ | data-ottawa 6 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||
With gpt5 did you try adjusting the reasoning level to "minimal"? I tried using it for a very small and quick summarization task that needed low latency and any level above that took several seconds to get a response. Using minimal brought that down significantly. Weirdly gpt5's reasoning levels don't map to the OpenAI api level reasoning effort levels. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
| ▲ | barbazoo 6 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||
Hard to gauge what gibberish is without an example of the data and what you prompted the LLM with. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||