Remix.run Logo
ACCount37 3 hours ago

Anyone who says "I understand how it works" is completely full of shit.

Modern production grade LLMs are entangled messes of neural connectivity, produced by inhuman optimization pressures more than intelligent design. Understanding the general shape of the transformer architecture does NOT automatically allow one to understand a modern 1T LLM built on the top of it.

We can't predict the capabilities of an AI just by looking at the architecture and the weights - scaling laws only go so far. That's why we use evals. "Just go by behavior" is the industry standard of AI evaluation, and for a good damn reason. Mechanistic interpretability is in the gutters, and every little glimpse of insight we get from it we have to fight for uphill. We don't understand AI. We can only observe it.

"What can this thing realistically achieve?" Beat an average human on a good 90% of all tasks that were once thought to "require intelligence". Including tasks like NLP/NLU, tasks that were once nigh impossible for a machine because "they require context and understanding". Surely it was the other 10% that actually required "real intelligence", surely.

The gaps that remain are: online learning, spatial reasoning and manipulation, long horizon tasks and agentic behavior.

The fact that everything listed has mitigations (i.e. long context + in-context learning + agentic context management = dollar store online learning) or training improvements (multimodal training improves spatial reasoning, RLVR improves agentic behavior), and the performance on every metric rises release to release? That sure doesn't favor "those are fundamental limitations".

Doesn't guarantee that those be solved in LLMs, no, but goes to show that it's a possibility that cannot be dismissed. So far, the evidence looks more like "the limitations of LLMs are not fundamental" than "the current mainstream AI paradigm is fundamentally flawed and will run into a hard capability wall".

qsera 3 hours ago | parent | next [-]

Do yourself a favor and watch this video podcast shared by the following comment very carefully..

https://news.ycombinator.com/item?id=47421522

ACCount37 2 hours ago | parent [-]

Frankly, I don't buy that LeCun has that much of use to say about modern AI. Certainly not enough to justify an hour long podcast.

Don't get me wrong, he has some banger prior work, and the recent SIGReg did go into my toolbox of dirty ML tricks. But JEPA line is rather disappointing overall, and his distaste of LLMs seems to be a product of his personal aesthetic preference on research direction rather than any fundamental limitations of transformers. There's a reason why he got booted out of Meta - and it's his failure to demonstrate results.

That talk of "true understanding" (define true) that he's so fond of seems to be a flimsy cover for "I don't like the LLM direction and that's all everyone wants to do those days". He kind of has to say "LLMs are fundamentally broken", because if they aren't, if better training is all it takes to fix them, then, why the fuck would anyone invest money into his pet non-LLM research projects?

It is an uncharitable read, I admit. But I have very little charity left for anyone who says "LLMs are useless" in year 2026. Come on. Look outside. Get a reality check.

qsera an hour ago | parent [-]

My opinions on the matter does not come from any experts and is coming from my own reason. I didn't see that video before I came across that comment.

>"LLMs are useless" in year 2026

Literally no one is saying this. It is just that those words are put into the mouths of the people that does not share the delusional wishful thinking of the "true believers" of LLM AI.

qsera 3 hours ago | parent | prev [-]

Mm..You seem to be consider this to be some mystical entity and I think that kind of delusional idea might be a good indication that you are having the ELIZA effect...

>We don't understand AI. We can only observe it.

Lol what? Height of delusion!

> Beat an average human on a good 90% of all tasks that were once thought to "require intelligence".

This is done by mapping those tasks to some representation that an non-intelligent automation can process. That is essentially what part of unsupervised learning does.