Yann LeCun was saying 3 years ago that because token generation is auto-regressive, its mathematically impossible to generate a long stream of coherent tokens, because errors amplify exponentially.

and then models learned that they can back track and error correct

so much for "mathematically impossible..."

▲

charcircuit 6 hours ago | parent | next [-]

I think it was largely the introduction of tool calling that allowed models to mitigate the issue of errors amplifying exponentially since it allows the model to understand if what it generated is correct or has issues that it needs to address. This addresses the potential lack of or low quality of world model by being able to reference the current state of the world.

	▲	ravenstine 5 hours ago \| parent [-]
		I've definitely realized this phenomenon after a few occasions of erroneously trying to rely purely on instructions to get an LLM to do a thing or take on a role, especially without persistent cloud-based sessions that have internal checklists and other opaque guidance. They're essentially poor at self-managing, but can do really well when they are limited in scope/context and are worked into a sort of state machine that guarantees they perform certain tasks predictably. They won't always do those tasks the exact way you expect them to, but at least they actually do them, and because of that they are more likely to have the correct prior context to perform the next task better. Because they are so prone to selectively ignoring directions, that can quickly send them down an incorrect path that compounds on itself.

▲

shevy-java 5 hours ago | parent | prev | next [-]

You insinuate here AI "learned".

I doubt that this was AI self-improvement.

▲

rcxdude 5 hours ago | parent | next [-]

Was there a particular change to the network or the way that it was trained that introduced the 'backtrack and error correct' mechanism?

▲

aswegs8 3 hours ago | parent | prev | next [-]

Does that take anything away from the argument?

▲

card_zero 2 hours ago | parent [-]

What argument, "a theory was wrong"? No, the inane central observation, the observation that a researcher was unable to predict a discovery before it was discovered, remains true despite the gratuitous insertion of a little bit of bullshit about AI learning.

I suppose it's additionally trying to imply something else, like "due to a pattern of researchers being unable to discover discoveries before they discover them, AGI is just around the corner".

▲

nok22kon an hour ago | parent [-]

its one thing to say "we dont know"

it's a different thing to say "it's mathematically impossible"

so if it turns out it is possible, what then? was math broken? or the researcher an idiot who either doesnt know math, or is just bullshitting non-existing proofs?

	▲	card_zero an hour ago \| parent [-]
		Mathematics doesn't tell you what is necessarily true. It consists of guessing about what is necessarily true. (I don't know the details of what happened, it could be either or neither of the things you said.)

▲

infinite_spin 5 hours ago | parent | prev [-]

do you have a problem with this field of research being called "machine learning"?

▲

TMWNN 5 hours ago | parent | prev | next [-]

> and then models learned that they can back track and error correct

You mean "Human developers learned ...", yes? Or was there really an all AI-driven, self-improving aspect to this?

▲

rcxdude 5 hours ago | parent [-]

Well, LLM networks don't have a 'back track and error correct' component in the design, AFAIK.

	▲	fatata123 3 hours ago \| parent [-]
		[dead]

▲

waldarbeiter 5 hours ago | parent | prev | next [-]

[dead]

▲

threethirtytwo 5 hours ago | parent | prev | next [-]

Stop attacking Yann. I would say like 90% of the HN crowd was parroting Yann too.

▲

jiggawatts 6 hours ago | parent | prev [-]

Also, almost any argument against LLM intelligence also applies to humans.

I very commonly see someone make some small mistake and end up going in the wrong direction, “accumulating stupid” as they go, sometimes for years.

▲

shevy-java 5 hours ago | parent | next [-]

Humans can learn.

AI can not.

For those disagreeing: please explain how a static hardware can learn.

▲

echoangle 5 hours ago | parent | next [-]

By self-modifying the software. Currently the model harnesses only allow the model to modify its own prompt (which could be considered a really weak kind of learning), but theoretically, a model could design and train its own replacement and run that, continuously improving itself. I’m not sure if LLMs will be able to do that but the static hardware has nothing to do with it (since the bits on the harddrive aren’t static).

▲

someonebaggy 5 hours ago | parent | prev | next [-]

idk, how does voice recognition learn my voice? How can I install programs when the hardware is static?

▲

threethirtytwo 5 hours ago | parent | prev [-]

this is profoundly false. AI not only can learn, it is built entirely from learning. The field is called machine learning after all.

Not only that... AI is NOT only learning during the training phase... LLMs learn in real time the minute you talk to it. It learns something and saves those learnings in a context window or somewhere else if you want it to exist beyond the context window.

All of the above runs on static hardware. Don't understand how someone can say a profoundly wrong statement and get voted up.

	▲	guenthert 5 hours ago \| parent [-]
		Correct me if I'm wrong, but if a profound insight is gathered in session 1 with user A and stored in context A1, this might be available to user A in session 2, if that still has access to context A1, but won't be available to user B in any of his/her sessions until that NN is retrained with input which includes at least some of the information from context A1.

▲

fragmede 5 hours ago | parent | prev [-]

Also with the stochastic parrot thing. If you say just the right thing to the right human and the right time, they'll very predictibly say their favorite movie/book quote or song lyric, like some sort of parrot.

	▲	dgellow an hour ago \| parent [-]
		An LLM will tell you how a song feels, even if it has literally no way to experience music. Because it's not thoughts or feelings that you get from an LLM. We take a massive amount of information, compress it into a large graph, then explore sections of the graph via prompts. That's what the stochastic parrot means. And that doesn't compare with how humans think. It's just a completely different architecture