AI won't be put in important positions of responsibility within an organization because AI providers will never accept liability for bad decisions. You can't fire Claude if it fucks up, and it's got very limited ability to learn from its mistakes. It's also incapable of making good decisions where doing so requires synthesizing more than a few hundred thousand tokens worth of domain knowledge/experience in something that doesn't have an infinite amount of synthetic verifiable training data like code and math.

In theory continuous learning (live weight updates) could help to some degree. But there's essentially no progress towards that because it requires solving a few hard, currently completely unsolved problems. 1. Weights drift over time and there's no way to re-merge them after a few tens of thousands of updates, so when a new model version was released there'd be no way to update existing continuously-learned models to that. 2. It'd allow permanent jailbreaking. And 3. A model can't learn new things without forgetting existing things, unlike humans brains which have hardware plasticity (like London taxi drivers having larger hippocampi due to having to memorize so many streets).

▲

sophiabits 4 hours ago | parent [-]

> You can't fire Claude if it fucks up

What's the difference between "firing" Claude vs moving to a model from a different provider? The latter seems very analogous to firing an employee for performance and backfilling with someone new.

Re the rest, it's just not my experience that models become incapable of making good decisions in cases where input token count > the context window, but ymmv based on domain.

A very extreme example of this: a couple years ago when GPT 4 was state of the art and the 32K context variant was gated to design partners I worked at an EdTech company in the college admissions space that wanted to produce quarterly reports on student progress for parents. That involved crunching a LOT of data (multiple hours of meeting transcripts per week, very detailed notes about student activities, their general profile - UK and US admissions function very differently!)

It was a difficult problem, but we _did_ manage to produce these reports 4K output tokens at a time at a level of quality that exceeded what humans could do internally, and models+the surrounding tooling have only gotten better since then.

	▲	logicchains 3 hours ago \| parent [-]
		>What's the difference between "firing" Claude vs moving to a model from a different provider? The latter seems very analogous to firing an employee for performance and backfilling with someone new. A human may learn and improve to avoid being fired, while Claude is incapable of that. >Re the rest, it's just not my experience that models become incapable of making good decisions in cases where input token count > the context window, but ymmv based on domain. If they've been trained a lot on your domain (maths, coding) then they can make good decisions. But I've just started using Mythos and even it makes some awful decisions in domains it's not trained on. Of course the majority of decisions are good, but it only takes a couple bad ones to sink a project.