Remix.run Logo
seviu 4 hours ago

I am on the opposite camp. Open models are starting to perform better. GPT 5.5 keeps on messing things up.

On the contrary, pi + glm + DeepSeek… bliss.

Fable was a different kind of beast though. Rip.

square_usual 2 hours ago | parent | next [-]

Every time I use opus these days I go shut up... you are not fable.. Hard to imagine how just three days with it changed how I saw LLM use.

ftkftk an hour ago | parent [-]

Same.

baq 4 hours ago | parent | prev | next [-]

Yeah, Opus/GPT need multiple rounds of reviews from each other to get to clean auto review. Fable was like, it is done and indeed… crickets in bot comments. ‘No issues’ galore.

aaroninsf an hour ago | parent [-]

I wonder if this will hold as other models with different biases achieve parity.

arizen 4 hours ago | parent | prev | next [-]

Ditto on GLM 5.2 + DeepSeek V4 Flash combo.

For most important work (complex, cross-domain inquiries etc.), I still rely on Codex GPT 5.5 though.

whalesalad 3 hours ago | parent | prev | next [-]

GPT-5.5 has been really hard to beat imho. I've spent $$$ on Opus, Deepseek v4 Pro and recently started to dogfood GLM-5.2 (which is not bad) but I cannot really trust any of them (almost blind) like I can trust GPT-5.5. It gives me tremendous confidence. I cannot say the same for any of the others I mentioned.

baddash an hour ago | parent | prev | next [-]

how much does your setup cost you? just curious

enraged_camel 4 hours ago | parent | prev [-]

>> I am on the opposite camp. Open models are starting to perform better. GPT 5.5 keeps on messing things up.

I'm working in a 600k+ LoC codebase that has complex domain-specific logic and lots of moving parts. I find that Codex 5.5 is pretty good at surgical fixes, but does not go out of its way to explore and figure out what those surgical fixes might break. So I only use it to work on parts of the system that are pretty isolated from everything else so that risk of regression is small.