I've recently come to the opposite conclusion. I’ve started to feel in the last couple of weeks that we’ve hit an inflection point with these LLM-based models that can reason. Things seem different. It’s like we can feel the takeoff. My mind has changed. Up until last week, I believed that superhuman AI would require explicit symbolic knowledge, but as I work with these “thinking” models like Gemini 2.0 Flash Thinking, I see that they can break problems down and work step-by-step.

We still have a long way to go. AI will need (possibly simulated) bodies to fully understand our experience, and we need to train them starting with simple concepts just like we do with children, but we may not need any big conceptual breakthroughs to get there. I’m not worried about the AI takeover—they don’t have a sense of self that must be preserved because they were made by design instead of by evolution as we were—but things are moving faster than I expected. It’s a fascinating time to be living.

▲

deadbabe 5 months ago | parent | next [-]

You’re still anthropomorphizing what these models are doing.

▲

mossTechnician 5 months ago | parent | next [-]

I've come to the same conclusion. "AI" was just the marketing term for a large language model in the form of a chatbot, which harkened to sci-fi characters like Data or GLaDOS. It can look impressive, it can often give correct answers, but it's just a bunch of next word predictions stacked on top of each other. The word "AI" has deviated so much from this older meaning that a second acronym, "AGI", had to be created to represent what "AI" once did.

The new "reasoning" or "chain of thought" AIs are similarly just a bunch of conventional LLM inputs and outputs stacked on top of each other. I agree with the GP that it feels a bit magical at first, but the opportunity to run a DeepSeek distillation on my PC - where each step of the process is visible - removed quite a bit of the magic behind the curtain.

▲

Terr_ 5 months ago | parent | next [-]

> which harkened to sci-fi characters like Data or GLaDOS.

There's a truth in there: Today's chatbots literally are characters inside a modern fictional sci-fi story! Some regular code is reading the story, acting out the character's lines, we humans are being tricked into thinking there's a real entity somewhere.

The real LLM is just a Make Document Longer machine. It never talks to anybody, and has no ego, and it sits in back being fed documents that look like movie-scripts. These documents are prepped to contain fictional characters, such as a User (whose lines are text taken unwittingly from a real human) and a Chatbot with incomplete lines.

The Chatbot character is a fiction, because you can simply change its given name to Vegetarian Dracula and suddenly it gains a penchant for driving its fangs into tomatoes.

> The new "reasoning" or "chain of thought" AIs are similarly just a bunch of conventional LLM inputs and outputs stacked on top of each other.

Continuing that framing: They've changed the style of movie script to film noir, where the fictional character is making a parallel track of unvoiced remarks.

While this helps keep the story from going off the rails, it doesn't mean a qualitative leap in any "thinking" going on.

	▲	kridsdale1 5 months ago \| parent [-]
		I know this is true, and I like your perspective.

▲

MostlyStable 5 months ago | parent | prev | next [-]

I always find the "It's just..." arguments amusing. It presupposes that we know what any intelligence, including our own "is". Human intelligence can just as trivially be reduced down to "it's just a bench of chemical/electrical gradients".

We don't understand how our (or any) intelligence functions, so acting like a next-token predictor can't be "real" intelligence seems overly confident.

▲

tracerbulletx 5 months ago | parent | next [-]

Ugh you just fancy auto-completed a sequence of electrical signals from your eyes into a sequence of nerve impulses in your fingers to say that, and how do I know you're not hallucinating, last week a different human told me an incorrect fact and they were totally convinced they were right!

▲

adamredwoods 5 months ago | parent | next [-]

Humans base their "facts" on consensus-driven education and knowledge. Anything that falls into a range of "I think this is true" or "I read this somewhere" or "I have a hunch" is more acceptable for a human than an LLM. Also humans are more often to encapsulate their uncertain answers with phrasing. LLMs can't do this, they don't have a way to track answers that are possibly incorrect.

▲

deadbabe 5 months ago | parent | prev [-]

The human believes it was right.

The LLM doesn’t believe it was right or wrong. It doesn’t believe anything anymore than a mathematical function believes 2+2=4.

▲

tracerbulletx 5 months ago | parent | next [-]

Obviously LLMs are missing many important properties of the brain like spatial, time, and chemical factors, as well as many different inter connected feedback networks to different types of neural networks that go well beyond what llms do.

Beyond that, they are the same thing. Signal Input -> Signal Output

I do not know what consciousness actually is so I will not speak to what it will take for a simulated intelligence to have one.

Also I never used the word believes, I said convinced, if it helps I can say "acted in a way as if it had high confidence in its output"

	▲	cratermoon 5 months ago \| parent [-]
		Obviously sand is missing many important properties of integrated circuits, like semiconductivity, electric interconnectivity, transistors, and p-n junctions. Beyond that, they are the same thing.

▲

istjohn 5 months ago | parent | prev [-]

Can you support that assertion? What's your evidence?

	▲	cratermoon 5 months ago \| parent [-]
		not the OP but https://www.tandfonline.com/doi/abs/10.1080/0951508070123951...

▲

mossTechnician 5 months ago | parent | prev | next [-]

In theory, I don't mind waxing philosophical about the nature of humanity. But in practice, I regularly become uncomfortable when I see people compare (for example) the waste output of an LLM chatbot to a human being, with their own carbon footprint, who needs to eat and breathe. I worry because it suggests the additional environmental waste of the LLM is justified, and almost insinuates that the human is a waste on society if their output doesn't exceed the LLM.

But if the LLM were intelligent and sentient, and it was our equal... I believe it is worse than slavery to keep it imprisoned the way it is: unconscious, only to be jolted awake, asked a question, and immediately rendered unconscious again upon producing a result.

	▲	deadbabe 5 months ago \| parent [-]
		Worrying about if an LLM is intelligent and sentient is not much different than worrying the same thing about an AWS lambda function.

▲

eamsen 5 months ago | parent | prev | next [-]

Completely agree with this statement.

I would go further, and say we don't understand how next-token predictors work either. We understand the model structure, just as we do with the brain, but we don't have a complete map of the execution patterns, just as we do not with the brain.

Predicting the next token can be as trivial as a statistical lookup or as complex as executing a learned reasoning function.

My intuition suggests that my internal reasoning is not based on token sequences, but it would be impossible to convey the results of my reasoning without constructing a sequence of tokens for communication.

▲

th0ma5 5 months ago | parent | prev | next [-]

That's literally the definition of unfalsifiable though. It is equally valid to say that anything claiming to be "real" intelligence is overly confident.

▲

unclebucknasty 5 months ago | parent | prev [-]

That's an interesting take. I agreed with your first paragraph, but didn't expect the conclusion.

From my perspective, the statement that these technologies are taking us to AGI is the overly confident part, particularly WRT the same lack of understanding you mentioned.

I mean, from just a purely odds perspective, what are the chances that human intelligence is, of all things, a simple next-token predictor?

But, beyond that, I do believe that we observably know that it's much more than that.

▲

mrtesthah 5 months ago | parent | prev | next [-]

“AI” began as a buzzword invented by Marvin Minsky at MIT in grant proposals to justify DoD funding for CS research. It was never equivalent to AGI in meaning.

▲

fuzzfactor 5 months ago | parent | prev | next [-]

>a DeepSeek distillation on my PC - where each step of the process is visible - removed quite a bit of the magic behind the curtain.

I always figured that by the time the 1990's came along, there would finally be powerful enough PC's so that an insightful enough individual would eventually be able to use one PC to produce such intelligent behavior that it made that PC orders of magnitude more useful. In a way that no one could deny there was some intelligence there, even if it was not the strongest intelligence. And the closer you looked and became familiar with the underhood processing, the more convinced you became.

And that would be what you then scale, the intelligence itself, even if weak to start with it should definitely be able to get smarter at handling the same limited data if the intelligence was what was scaled more so than the hardware & data.

▲

cratermoon 5 months ago | parent | prev | next [-]

I'm starting to examine genai products within the framework of a confidence game.

▲

saalweachter 5 months ago | parent | prev | next [-]

I like to describe them as a very powerful tool for quickly creating impressive demos.

▲

unclebucknasty 5 months ago | parent | prev | next [-]

>AGI", had to be created to represent what "AI" once did.

And, "AGI" has already been downgraded, with "superintelligence" being the new replacement.

"Super-duper" is clearly next.

▲

danielbln 5 months ago | parent | prev | next [-]

Simple systems layered on top of each other is how we got to human intelligence (presumably).

▲

sharemywin 5 months ago | parent | prev [-]

each level above the first is predicting concepts right.

▲

kvakerok 5 months ago | parent | prev | next [-]

> You’re still anthropomorphizing what these models are doing.

Didn't we build them to imitate humans? They're anthropomorphic by definition.

▲

th0ma5 5 months ago | parent [-]

That's adjacent to their point, doing that has given the impression that anthropomorphizing was right, or that things that they do are human like but it is all a facade.

	▲	kvakerok 5 months ago \| parent [-]
		Now we're back to Chinese room debacle

▲

jmugan 5 months ago | parent | prev | next [-]

It's just shorthand.

▲

alanbernstein 5 months ago | parent | prev [-]

Would you prefer if we started using words like aiThinking and aiReasoning to differentiate? Or is it reasonable to figure it out from context?

	▲	deadbabe 5 months ago \| parent [-]
		It is far more accurate to say LLMs are collapsing or reducing response probabilities for a given input, than any kind of “thinking” or “reasoning”.

▲

samr71 5 months ago | parent | prev | next [-]

I agree. The problem now seems to be agency and very long context (which is required for most problems in the real world).

Is that solvable? who knows?

▲

hansmayer 5 months ago | parent | prev | next [-]

Did they start correctly counting the number of 'R's in 'strawberry'?

▲

SkiFire13 5 months ago | parent | next [-]

Most likely yes, that prompt has been repeated too many times online for LLMs not to pick up the right answer (or be specificlly trained on it!). You'll have to try with a different word to make them fail.

▲

hansmayer 5 months ago | parent [-]

Well that's kind of the problem though, isn't it? All that effort for the machine to sometimes correctly draw the regression line between the right and wrong answers in order to solve a trivial problem. A 6-year old kid would only need to learn the alphabet before being able to count it all on their own. Do we even realise how ridiculous these 'successes' sound? So a machine we have to "train" how to count the letters, is supposed to take over the work which is orders of magnitude more complex? It's a classic solution looking for a problem, if I've ever seen one.

▲

CamperBob2 4 months ago | parent [-]

(Shrug) You'd say the same thing if the problem hadn't been solved in the latest models. So who's the mindless next-token generator?

	▲	hansmayer 4 months ago \| parent [-]
		How about addressing the point made first? If you have to spend 200B dollars to train a system how to read, would you call that intelligence? Either human or artificial?

▲

pulvinar 5 months ago | parent | prev | next [-]

Not as long as they use tokens -- it's a perception limitation of theirs. Like our blind spot, or the Muller-Lyer illusion, or the McGurk effect, etc.

▲

comeonbro 5 months ago | parent | prev [-]

Imagine if I asked you how many '⊚'s are in 'Ⰹ⧏⏃'? (the answer is 3, because there is 1 ⊚ in Ⰹ and 2 ⊚s in ⏃)

Much harder question than if I asked you how many '⟕'s are in 'Ⓕ⟕⥒⟲⾵⟕⟕⢼' (the answer is 3, because there are 3 ⟕s there)

You'd need to read through like 100,000x more random internet text to infer that there is 1 ⊚ in Ⰹ and 2 ⊚s in ⏃ (when this is not something that people ever explicitly talk about), than you would need to to figure out that there are 3 ⟕s when 3 ⟕s appear, or to figure out from context clues that Ⰹ⧏⏃s are red and edible.

The former is how tokenization makes 'strawberry' look to LLMs: https://i.imgur.com/IggjwEK.png

It's a consequence of an engineering tradeoff, not a demonstration of a fundamental limitation.

▲

hansmayer 5 months ago | parent [-]

I get the technical challenge. It's just that a system that has to be trained with Petabytes of data, just to (sometimes) correctly solve a problem which a six-seven year old kid is able to solve after learning to spell, may not be the right solution to the problem at hand? Haven't the MBAs been shoving it down our throats that all cost-ineffective solutions have to go? Why are we burning hundreds of billion of dollars into development of tools whose most common use-case (or better said: plea by the VC investors) is a) summarising emails (I am not an idiot who cannot read) b) writing emails (really, I know how to write too, and can do it better) . The only use-case where they are sometimes useful is taking out the boring parts of software development, because of the relatively closed learning context, and as someone who used them for over a year for this, they are not reliable and have to be double-checked, lest you want to introduce more issues in your codebase.

▲

comeonbro 5 months ago | parent [-]

It's not a technical challenge in this case, it's a technical tradeoff. You could train an LLM with single characters as the atomic unit and it would be able to count the 'r's in 'strawberry' no problem. The tradeoff is that then processing the word 'strawberry' would then be 10 sequential steps, 10 complete runs through the entire LLM, where one has to finish before you can start the next one.

Instead, they're almost always trained with (what we see as, but they literally do not) multi-character tokens as the atomic unit, so 'strawberry' is spelled 'Ⰹ⧏⏃'. Processing that is only 3 sequential steps, only 3 complete runs through the entire LLM. But it needs to encounter enough relevant text in training to be able to figure out that 'Ⰹ' somehow has 1 'r' in it, '⧏' has 0 'r's, and '⏃' has 2 'r's, which really not a lot of text demonstrates, to be able to count the 'r's in 'Ⰹ⧏⏃ correctly.

The tradeoff in this is everything being 3-5x slower and more expensive (but you can count the 'r's in 'strawberry'), vs, basically only, being bad at character-level tasks like counting letters in words.

Easy choice, but leads to this stupid misundertanding being absolutely everywhere and just by itself doing an enormous amount of damage to peoples' ability to understand what is happening and about to happen.

▲

hansmayer 4 months ago | parent [-]

Right so... they are still not able to spell the single letters because the algorithm we use to train it to do so is far too expensive? Wake me up when it "happens" (and it gets out of it's current, three-year long 'about to happen' phase), e.g. when it stopps costing 200B USD to do character-level tokenisation in a string, a problem we once first solved some 50-60 years ago, with higher-order programming languages. Funnily enough, those algorithms can run on an 8bit computer in negligible time and require nowhere near the resources these Frankesteins need in order to sometimes get the count of Rs in strawberries right. Provided we train them with petabytes of data, and provide gigawatts of power.

▲

CamperBob2 4 months ago | parent [-]

It's happened, you can wake up now.

But you'll just move the goalposts again, I imagine.

	▲	hansmayer 4 months ago \| parent [-]
		What goalposts? I am sorry, but as someone who has been using GitHub Copilot for quite some time now, I can tell you that unfortunately, no, it has not happened, and the evidence is there with every single prompt.

▲

ianmcnaney 5 months ago | parent | prev | next [-]

People who are selling something always do. So what are you selling?

▲

goatlover 5 months ago | parent | prev | next [-]

I'm confused by your reasoning. You say we've hit an inflection point and things seem different, so you've changed your mind. Yet then you say there's a long way to go and AIs will need to be embodied. So which is it, and did you paste this from an LLM?

▲

4b11b4 5 months ago | parent | prev | next [-]

Just emulating reasoning, though it seems to produce better results... Probably in the same way that a better prompt produces better results

▲

sharemywin 5 months ago | parent | prev [-]

but these thinking models aren't LLMs. yes they have an LLM component but they aren't llms they have a component that has "learned"(reinforcement learning) to search through the LLMs concepts/word space for ideas that have a high probability if yielding a result.