Remix.run Logo
dongobread 4 hours ago

What a strangely hostile statement on an open weight model. Running like 20 benchmark evaluations isn't trivial by itself, and even updating visuals and press statements can take a few days at a tech company. It's literally been 5 days since this "new generation" of models released. GPT-5.3(-codex) can't even be called via API, so it's impossible to test for some benchmarks.

I notice the people who endlessly praise closed-source models never actually USE open weight models, or assume their drop-in prompting methods and workflow will just work for other model families. Especially true for SWEs who used Claude Code first and now think every other model is horrible because they're ONLY used to prompting Claude. It's quite scary to see how people develop this level of worship for a proprietary product that is openly distrusting of users. I am not saying this is true or not of the parent poster, but something I notice in general.

As someone who uses GLM-4.7 a good bit, it's easily at Sonnet 4.5 tier - have not tried GLM-5 but it would be surprising if it wasn't at Opus 4.5 level given the massive parameter increase.

maxdo an hour ago | parent | next [-]

but even opus 4.5 is history now, codex-5-3 and opus 4.6 are one more step forward. The opus itself caused paradigm shift, from writing code with AI, to ai is writing code with human.

open weight models are not there at all yet.

apimade 2 hours ago | parent | prev [-]

Isn’t trivial? How is it not completely automated at this point?