OpenAI's "PRO" subscription is really a waste of money IMHO for this and other reasons.

Decided to give PRO a try when I kept getting terrible results from the $20 option.

So far it's perhaps 20% improved in complex code generation.

It still has the extremely annoying ~350 line limit in its output.

It still IGNORES EXPLICIT CONTINUOUS INSTRUCTIONS eg: do not remove existing comments.

The opaque overriding rules that - despite it begging forgiveness when it ignores instructions - are extremely frustrating!!

▲

JoshuaDavid 4 days ago | parent | next [-]

One thing that has worked for me when I have a long list of requirements / standards I want an LLM agent to stick to while executing a series of 5 instructions is to add extra steps at the end of the instructions like "6. check if any of the code standards are not met - if not, fix them and return to step 5" / "7. verify that no forbidden patterns from <list of things like no-op unit tests, n+1 query patterns, etc> exist in added code - if you find any, fix them and return to step 5" etc.

Often they're better at recognizing failures to stick to the rules and fixing the problems than they are at consistently following the rules in a single shot.

This does mean that often having an LLM agents so a thing works but is slower than just doing it myself. Still, I can sometimes kick off a workflow before joining a meeting, so maybe the hours I've spent playing with these tools will eventually pay for themselves in improved future productivity.

▲

jmaker 4 days ago | parent | prev [-]

There are things it’s great at and things it deceives you with. In many things I needed it to check something for me I knew was a problem, o3 kept insisting it were possible due to reasons a,b,c, and thankfully gave me links. I knew it used to be a problem so surprised I followed the links only to read black on white it still wasn’t. So I explained to o3 that it’s wrong. Two messages later we were back at square one. One week later it didn’t update its knowledge. Months later it’s still the same.

But at things I have no idea about like medicine it feels very convincing. Am I in hazard?

People don’t understand Dunning-Kruger. People are prone to biases and fallacies. Likely all LLMs are inept at objectivity.

My instructions to LLMs are always strictness, no false claims, Bayesian likelihoods on every claim. Some modes ignore the instructions voluntarily, while others stick strictly to them. In the end it doesn’t matter when they insist on 99% confidence on refuted fantasies.

	▲	namibj 4 days ago \| parent [-]
		The problem is that all current mainstream LLMs are autoregressive decoder-only, mostly but not exclusively transformers. Their math can't apply modifiers like "this example/attempt there is wrong due to X,Y,Z" to anything that came before the modifier clause in the prompt. Despite how enticing these models are to train, these limitations are inherent. (For this specific situation people recommend going back to just before the wrong output and editing the message to reflect this understanding, as the confidently wrong output with no advisory/correcting pre-clause will "pollute the context": the model will look at the context for some aspects coded into high(-er)-layer token embeddings, inherently can't include the correct/wrong aspect because we couldn't apply the "wrong"/correction to the confidently-wrong tokens, thus retrieves the confidently-wrong tokens, and subsequently spews even more BS. Similar to how telling a GPT2/GPT3 model it's an expert on $topic made it actually be better on said topic, this affirmation of that the model made an error will prime the model to behave in a way that it gets yelled at again... sadly.)