"Really does not sound like that from your description. It sounds like coaching a noob, which is a lot of work in itself."

And if this is true, you will have to coach AI each time whereas a person should advance over time.

▲

raincole 7 hours ago | parent | next [-]

At least you can ask AI to summarize a AGENT.md or something and it will read it diligently next time.

As for humans, they might not have the motivation technical writing skill to document what they learnt. And even if they did, the next person might not have the patience to actually read it.

▲

ay 6 hours ago | parent | next [-]

"Read diligently" - that’s a very optimistic statement. I can not count how many times Claude (LLM I am most familiar with, I had it write probably about 100KLOC in the past few months) explicitly disobeyed what was written in the instructions.

Also, a good few times, if it were a human doing the task, I would have said they both failed to follow the instructions and lied about it and attempted to pretend they didn’t. Luckily their lying abilities today are primitive, so it’s easy to catch.

▲

smsm42 4 hours ago | parent | next [-]

Psychopatic behavior seems to be a major problem for these (of course it doesn't think so it can't be called that but that's the closest term that fits). They are trained to arrive at the result, and if the most likely path to it is faking it and lying about it, then that's what you are getting. And if you find it, it will cheerfully admit it and try to make s better lie that you'd believe.

▲

onionisafruit 3 hours ago | parent | prev | next [-]

So true. I have some non-typical preferences for code style. One example is I don’t like nested error checks in Go. It’s not a correctness issue, it’s just a readability preference. Claude and copilot continually ignore this no matter how much emphasis I give it in the instructions. I recently found a linter for this, and the agent will fix it when the linter points out the issue.

This is probably because the llm is trained on millions of lines of Go with nested error checks vs a few lines of contrary instructions in the instructions file.

I keep fighting this because I want to understand my tools, not because I care that much about this one preference.

▲

jaggederest 5 hours ago | parent | prev | next [-]

Claude has really gone downhill in the last month or so. They made a change to move the CLAUDE.md from the system prompt to being occasionally read in, and it really deprioritizes the instructions to the same attention level as the code it's working on.

I've been trying out Codex the last couple days and it's much more adherent and much less prone to lying and laziness. Anthropic says they're working on a significant release in Claude Code, but I'd much rather have them just revert back to the system as it was ~a month ago.

	▲	CuriouslyC 5 hours ago \| parent \| next [-]
		Claude is cooked. GPT5 codex is a much stronger model, and the codex cli is much more performant/robust than cc (even if it has fewer features). I've never had a model lie to me as much as Claude. It's insane.
	▲	darkbatman 4 hours ago \| parent \| prev [-]
		true, I was using Cline/Roocode from almost an year and it always made sure to read things from memory-bank which i really liked. Claude has gone downhill from August mid for me and often it doesn't follow instructions from claude.md or forget things mid-way.

▲

derefr 2 hours ago | parent | prev [-]

> Also, a good few times, if it were a human doing the task, I would have said they both failed to follow the instructions and lied about it and attempted to pretend they didn’t.

It's funny. Just yesterday I had the experience of attending a concert under the strong — yet entirely mistaken — belief that I had already been to a previous performance of the same musician. It was only on the way back from the show, talking with my partner who attended with me (and who had seen this musician live before), trying to figure out what time exactly "we" had last seen them, with me exhaustively listing out recollections that turned out to be other (confusingly similar) musicians we had seen live together... that I finally realized I had never actually been to one of this particular musician's concerts before.

I think this is precisely the "experience" of being one of these LLMs. Except that, where I had a phantom "interpolated" memory of seeing a musician I had never actually seen, these LLMs have phantom actually-interpolated memories of performing skills they have never actually themselves performed.

Coding LLMs are trained to replicate pair-programming-esque conversations between people who actually do have these skills, and are performing them... but where those conversations don't lay out the thinking involved in all the many implicit (thinking, probing, checking, recalling) micro-skills involved in actually performing those skills. Instead, all you get in such a conversation thread is the conclusion each person reaches after applying those micro-skills.

And this leads to the LLM thinking it "has" a given skill... even though it doesn't actually know anything about "how" to execute that skill, in terms of the micro-skills that are used "off-screen" to come up with the final response given in the conversation. Instead, it just comes up with a prediction for "what someone using the skill" looks like... and thinks that that means it has used the skill.

Even after a hole is poked in its use of the skill, and it realizes it made a mistake, that doesn't dissuade it from the belief that it has the given skill. Just like, even after I asked my partner about the show I recall us attending, and she told me that that was a show for a different (but similar) musician, I still thought I had gone to the show.

It took me exhausting all possibilities for times I could have seen this musician before, to get me to even hypothesize that maybe I hadn't.

And it would likely take similarly exhaustive disproof (over hundreds of exchanges) to get an LLM to truly "internalize" that it doesn't actually have a skill it believed itself to have, and so stop trying to use it. (If that meta-skill is even a thing that LLMs have ever learned from their training data — which I doubt. And even if they did, you'd be wasting 90% of a Transformer's context window on this. Maybe something that's worth keeping in mind if we ever switch back to basing our LLMs on RNNs with true runtime weight updates, though!)

▲

giantg2 4 hours ago | parent | prev [-]

I find the summaries to be helpful. However, I find some of the detailed points to lack a deep understanding of technical points and their importance.

▲

rolisz 7 hours ago | parent | prev | next [-]

And then they skip to another job for more money, and you start again with a new hire.

▲

Avicebron 7 hours ago | parent | next [-]

Thankfully after many generations of human interactions and complex analysis of group dynamics, we've found a solution. It's called 'don't be an asshole' and 'pay people competitively'.

edit: because people are stupid, 'competitively' in this sense isn't some theoretical number pulled from an average, it's 'does this person feel better off financially working with you than others around them who don't work with you, and is is this person meeting their own personal financial goals through working with you'?

▲

binary132 4 hours ago | parent | next [-]

The elephant in this particular room is that there are a tiny handful of employers that have so much money that they can and do just pay whatever amount is more than any of their competitors can possibly afford.

	▲	giantg2 2 hours ago \| parent [-]
		That shouldn't be a big deal since they're a finite portion of the market. You should have a robust enough model to handle people leaving, including unavoidable scenarios like retirement and death.

▲

thfuran 4 hours ago | parent | prev | next [-]

The common corporate policy of making it harder to give raises than to increase starting salaries for new hires is insane.

	▲	SOLAR_FIELDS 20 minutes ago \| parent [-]
		Is it insane? Makes perfect sense. Employee has way less leverage at raise time. It’s all about leverage. It sucks, but that is the reality

▲

wiseowise 5 hours ago | parent | prev | next [-]

They do have a point. Why waste a time on person who will always need more money over time, rather than invest in AI? Not only you don’t need to please every hire, your seniors will be more thankful too, because they will get linearly faster with time.

▲

smsm42 3 hours ago | parent [-]

Outside of working for Antropic etc., there's no way you can make an LLM better at anything. You can train a junior though.

	▲	victorbjorklund 3 hours ago \| parent [-]
		You can def provide better context etc.

▲

faangguyindia 5 hours ago | parent | prev [-]

The person paying and the one responsible for coaching others usually aren't same

▲

giantg2 2 hours ago | parent | prev [-]

That's not a bad thing. It means you've added one more senior to the societal pool. A lot of the talent problems today are due to companies not wanting to train and focusing on cheap shortcut options like outsourcing or H1B

▲

mensetmanusman 6 hours ago | parent | prev [-]

The AI in this example is 1/100 the cost.

▲

gnerd00 4 hours ago | parent | next [-]

that is absolutely false - the capital and resources used to create these things are societal scale. An individual consumer is not paying that cost at this time.

▲

victorbjorklund 3 hours ago | parent | next [-]

You can make the same argument about humans. The employeer doesnt pay the full cost and time to create the worker from an embryo to a senior dev.

	▲	devmor 14 minutes ago \| parent [-]
		Unless you are advocating for executing developers when they are no longer capable of working, that’s a bit of a non sequitur. Humans aren’t tools.

▲

mensetmanusman 4 hours ago | parent | prev [-]

That only proves the point. If something increases the value of someone’s time by 5% and 500,000,000 people are affected by it, the cost will collapse.

These models are only going to get better and cheaper per watt.

	▲	devmor 12 minutes ago \| parent [-]
		> These models are only going to get better and cheaper per watt. What do you base this claim on? They have only gotten exponentially more expensive for decreasing gain so far - quite the opposite of what you say.

▲

cratermoon 5 hours ago | parent | prev [-]

For now, not including externalities.