Remix.run Logo
rectang an hour ago

In my anecdotal experience there is a huge gap between GPT-5-mini which hallucinates relentlessly and Claude Opus or the latest GPTs which are fairly reliable. I'm hoping that gap can be closed with improved approaches for small models and that good reliability is achievable for LLMs without requiring absolutely mammoth computing resources.

For what it's worth, I also used GPT-5.2 (via duck.ai) this year for questions about taxes and it was helpful — which makes sense because there's an abundance of material about taxes out there to be synthesized, so a text predictor trained in that domain should do well.