Remix.run Logo
furyofantares a day ago

He graded GPT 4 as winning because it didn't follow his instructions. And the instructions are unrealistic to anyone using coding assistants.

Maybe it's true that for some very bad prompts, old version did a better job by not following the prompt, and that this is reduced utility for some people.

Unrelated to assistants or coding, as an API user I've certainly had model upgrades that feel like downgrades at first, until I work out that the new model is following my instructions better. Sometimes my instructions were bad, sometimes they were attempts to get the older model to do what I want by saying over-the-top stuff that the new model now follows more precisely to a worse result. So I can definitely imagine that new models can be worse until you adapt.

Actually, another strange example like this - I had gotten in the habit of typing extremely fast to LLMs because they work just fine with my prompts riddled with typos. I basically disconnected the part of my brain that cares about sequencing between hands, so words like "can" would be either "can" or "cna". This ended up causing problems with newer models which would take my typos seriously. For example, if I ask to add support for commandline flag "allwo-netwokr-requests" it will usually do what I said, while previous versions would do what I wanted.

For anyone with some technical expertise and who is putting in serious effort to using AI coding assistants, they are clearly getting better at a rapid pace. Not worse.