Nothing will really work when the models fail at the most basic of reasoning challenges.

I've had models do the complete opposite of what I've put in the plan and guidelines. I've had them go re-read the exact sentences, and still see them come to the opposite conclusion, and my instructions are nothing complex at all.

I used to think one could build a workflow and process around LLMs that extract good value from them consistently, but I'm now not so sure.

I notice that sometimes the model will be in a good state, and do a long chain of edits of good quality. The problem is, it's still a crap-shoot how to get them into a good state.

▲

mstank 2 hours ago | parent | next [-]

In my experience this was an issue 6-8 months ago. Ever since Sonnet 4 I haven’t had any issues with instruction following.

Biggest step-change has been being able to one-shot file refactors (using the planning framework I mentioned above). 6 months ago refactoring was a very delicate dance and now it feels like it’s pretty much streamlined.

▲

hu3 2 hours ago | parent | prev | next [-]

Check context size.

LLMs become increasingly error-prone as their memory is fills up. Just like humans.

In VSCode Copilot you can keep track of how many tokens the LLM is dealing with in realtime with "Chat Debug".

When it reaches 90k tokens I should expect degraded intelligence and brace for a possible forced sumarization.

Sometimes I just stop LLMs and continue the work in a new session.

▲

alienbaby 2 hours ago | parent | prev [-]

I'm curious in what kinda if situations you are seeing the model the do opposite of your intention consistently where the instructions were not complex. Do you have any examples?

	▲	avereveard 2 hours ago \| parent [-]
		Mostly gemini 3 pro when I ask to investigate a bug and provide fixing options (i do this mostly so i can see when the model loaded the right context for large tasks) gemini immediately starts fixing things and I just cant trust it Codex and claude give a nice report and if I see they're not considering this or that I can tell em.