Remix.run Logo
littlestymaar 3 days ago

Writing philosophy that looks convincing has been a thing LLM do well since the first release ChatGPT back in 2022 (in my country back in early 2023, TV featured a kind of competition between ChatGPT and a philosopher turned media personality, with university professors blindly reviewing both essays and attempting to determine which was whom).

To have an idea about how good a model is on non-STEM tasks, you need to challenge it on stuff that is harder than this for LLMs, like summarization without hallucination or creative writing. OpenAI's nonthinking model are usually very good on these, but not their thinking models, whereas other players (be it Google, Anthropic or DeepSeek) manage to make models that can be very good at both.