Remix.run Logo
killerstorm 4 days ago

FWIW GPT-5 (and o3, etc.) is one of the most critical-minded LLMs out there.

If you ask for information which is e.g. academic or technical it would cite information and compare different results, etc, without any extra prompt or reminder.

Grok 4 (at the initial release) was just reporting information in the articles it found without any analysis.

Claude Opus 4 also seems bad: I asked it to give a list of JS libraries of a certain kind in deep research mode, and it returned a document focused on market share and usage statistics. Looks like it stumbled upon some articles of that kind and got carried away by it. Quite bizarre.

So GPT-5 is really good in comparison. Maybe not perfect in all situations, but perhaps better than an average human

eru 4 days ago | parent [-]

> So GPT-5 is really good in comparison. Maybe not perfect in all situations, but perhaps better than an average human

Alas, the average human is pretty bad at these things.