| ▲ | _dwt 2 days ago |
| I have a question for all the "humans make those mistakes too" people in this thread, and elsewhere: have you ever read, or at least skimmed a summary of, "The Origin of Consciousness in the Breakdown of the Bicameral Mind"? Did you say "yeah, that sounds right"? Do you feel that your consciousness is primarily a linguistic phenomenon? I am not trying to be snarky; I used to think that intelligence was intrinsically tied to or perhaps identical with language, and found deep and esoteric meaning in religious texts related to this (i.e. "in the beginning was the Word"; logos as soul as language-virus riding on meat substrate). The last ~three years of LLM deployment have disabused me of this notion almost entirely, and I don't mean in a "God of the gaps" last-resort sort of way. I mean: I see the output of a purely-language-based "intelligence", and while I agree humans can make similar mistakes/confabulations, I overwhelmingly feel that there is no "there" there. Even the dumbest human has a continuity, a theory of the world, an "object permanence"... I'm struggling to find the right description, but I believe there is more than language manipulation to intelligence. (I know this is tangential to the article, which is excellent as the author's usually are; I admire his restraint. However, I see exemplars of this take all over the thread so: why not here?) |
|
| ▲ | nine_k 2 days ago | parent | next [-] |
| If you look at different ancient traditions, you will notice how they struggle with the limitations of language, with its inability to represent certain things that are not just crucial for understanding the world, but also are even somehow communicable. Buddhists dug into that in a very analytical, articulate way, for instance. Another perspective: cetaceans are considered to be as conscious as humans, but any attempts to interpret their communication as a language failed so far. They can be taught simple languages to communicate with humans, as can be chimps. But apparently it's not how they process the world inside. |
| |
| ▲ | gbgarbeb 2 days ago | parent [-] | | You're a little out of date. Cetaceans communicate images to each other in the form of ultrasonic chirps. They chirp, they hear a reflection, and they repeat the reflection. | | |
| ▲ | nine_k 2 days ago | parent [-] | | Does this resemble human language, with syntax, the ability to define new notions based on known notions, etc? |
|
|
|
| ▲ | lp4v4n a day ago | parent | prev | next [-] |
| >I am not trying to be snarky; I used to think that intelligence was intrinsically tied to or perhaps identical with language I learned a long time ago that this wasn’t the case. I can speak several languages, and many times when I remember something and want to search for it on Google or any other AI engine, I can’t recall which language I originally read it in. So whatever mechanism the brain uses to store information, it’s certainly language‑agnostic. There are also many moments when you fully grasp a concept but forget the words to describe it, yet the concept itself remains clear in your mind. |
|
| ▲ | kgeist a day ago | parent | prev | next [-] |
| >and while I agree humans can make similar mistakes/confabulations, I overwhelmingly feel that there is no "there" there. What really opened my eyes a couple weeks ago (anyone can try this): I asked Sonnet to write an inference engine for Qwen3, from scratch, without any dependencies, in pure C. I gave it GGUF specs for parsing (to quickly load existing models) and Qwen3's architecture description. The idea was to see the minimal implementation without all the framework fluff, or abstractions. Sonnet was able to one-shot it and it worked. And you know what, Qwen3's entire forward pass is just 50 lines of very simple code (mostly vector-matrix multiplications). The forward pass is only part of the story; you just get a list of token probabilities from the model, that is all. After the pass, you need to choose the sampling strategy: how to choose the next token from the list. And this is where you can easily make the whole model much dumber, more creative, more robotic, make it collapse entirely by just choosing different decoding strategies. So a large part of a model's perceived performance/feel is not even in the neurons, but in some hardcoded manually-written function. Then I also performed "surgery" on this model by removing/corrupting layers and seeing what happens. If you do this excercise, you can see that it's not intelligence. It's just a text transformation algorithm. Something like "semantic template matcher". It generates output by finding, matching and combining several prelearned semantic templates. A slight perturbation in one neuron can break the "finding part" and it collapases entirely: it can't find the correct template to match and the whole illusion of intelligence breaks. Its corrupted output is what you expect from corrupting a pure text manipulation algorithm, not a truly intelligent system. |
| |
| ▲ | famouswaffles a day ago | parent [-] | | >And you know what, Qwen3's entire forward pass is just 50 lines of very simple code (mostly vector-matrix multiplications). The code being simple doesn't mean much when all the complexity is encoded in billions of learned weights. The forward pass is just the execution mechanism. Conflating its brevity with simplicity of the underlying computation is a basic misunderstanding of what a forward pass actually is. What you've just said is the equivalent of saying blackbox.py is simple because 'python blackbox.py' only took 1 line. It's just silly reasoning. >After the pass, you need to choose the sampling strategy: how to choose the next token from the list. And this is where you can easily make the whole model much dumber, more creative, more robotic, make it collapse entirely by just choosing different decoding strategies. So a large part of a model's perceived performance/feel is not even in the neurons, but in some hardcoded manually-written function. So ? I can pick the least likely token every time. The result would be garbage but that doesn't say anything about the model. The popular strategy is to randomly pick from the top n choices. What do you is keeping thousands of tokens coherent and on point even with this strategy ? Why don't you try sampling without a large language model to back it and see how well that goes for you ? >Then I also performed "surgery" on this model by removing/corrupting layers and seeing what happens. If you do this excercise, you can see that it's not intelligence. It's just a text transformation algorithm. Something like "semantic template matcher". It generates output by finding, matching and combining several prelearned semantic templates. A slight perturbation in one neuron can break the "finding part" and it collapases entirely: it can't find the correct template to match and the whole illusion of intelligence breaks. Its corrupted output is what you expect from corrupting a pure text manipulation algorithm, not a truly intelligent system. What do you think happens when you remove or corrupt arbitrary regions of the human brain? People can lose language, vision, memory, or reasoning, sometimes catastrophically. | | |
| ▲ | kgeist a day ago | parent [-] | | >The code being simple doesn't mean much when all the complexity is encoded in billions of learned weights. The forward pass is just the execution mechanism. Conflating its brevity with simplicity of the underlying computation is a basic misunderstanding of what a forward pass actually is. What you've just said is the equivalent of saying blackbox.py is simple because 'python blackbox.py' only took 1 line. It's just silly reasoning. Look at what a transformer actually does. Attention is a straightforward dictionary look up in like 3 matmuls. A FFN is a simple space transform rule with a non-linear cutoff to adjust the signal (i.e. a few more matmuls and an activation function) before doing a new dictionary lookup in the next transformer block. Add a few tricks like residual connections, output projections, and repeat N times. So yeah, the actual inference code is 50 lines of code, and the rest is large learned dictionaries to search in, with some transforms. So you're saying my one-liner program that consults a DB with 1 million rows is actually 1 million lines of code? Well, not quite. This trick, coupled with lots of prelearned templates, is enough to fool people into believing there's "there" there (the OP's post above). Just like ELIZA back in the day. Well, apparently this trick is enough to solve lots of problems, because apparently lots of problems only require search in a known problem (template) space (also with reduced dimensionality). But it's still just a fancy search algorithm. I think the whole thing about "emergent behavior" is that when a human is confronted with a huge prelearned concept space, it's so large they cannot digest what is actually happening, and tend to ascribe magical properties to it like "intelligence" or "consciousness". Like, for example, imagine if there was a huge precreated IF..THEN table for every possible question/answer pair a finite human might ask in their lifetime. It would appear to the human there's intelligence, that there's "there" there. But at the end of the day it would be just a static table with nothing really interesting happening inside of it. A transformer is just a nice trick that allows to compress this huge IF..THEN table into a few hundreds gigabytes. >So ? I can pick the least likely token every time. The result would be garbage but that doesn't say anything about the model. The popular strategy is to randomly pick from the top n choices. What do you is keeping thousands of tokens coherent and on point even with this strategy ? Why don't you try sampling without a large language model to back it and see how well that goes for you I was referring to the OP post's: there is no "there" there
It doesn't even "know" what the actual text continuation must be, strictly speaking. It just returns a list of probabilities that we must select. It can't select it itself. To go from "list of probabilities" to "chatbot" requires adding additional hardcoded code (no AI involved) that greatly influences how the chatbot behaves, feels. Imagine if an actual sentient being had a button: you press it, and suddenly Steven the sailor becomes a Chinese lady who discusses Confucius. Or starts saying random gibberish. There's no independent agency whatsoever. It's all a bunch of clever tricks.>What do you think happens when you remove or corrupt arbitrary regions of the human brain? People can lose language, vision, memory, or reasoning, sometimes catastrophically. In an actual brain, the structure of the connectome itself drives a lot of behavior. In an LLM, all connections are static and predefined. A brain is much more resistant to failure. In an LLM changing a single hypersensitive neuron can lead to a full model collapse. There are humans who live normal lives with a full hemisphere removed. | | |
| ▲ | famouswaffles 18 hours ago | parent [-] | | I get irritated when people act like they know what they are talking about but then it's just nonsense they keep spitting out. I'm honestly sick of it. There's a fair amount of LLM interpretability research out there. If you're actually interested in knowing better then go read them. I'll even link what i find interesting. All this talk of lookup tables is nonsensical. You have no idea what you're talking about. >It doesn't even "know" what the actual text continuation must be, strictly speaking. It just returns a list of probabilities that we must select. It can't select it itself. To go from "list of probabilities" to "chatbot" requires adding additional hardcoded code (no AI involved) that greatly influences how the chatbot behaves, feels. Imagine if an actual sentient being had a button: you press it, and suddenly Steven the sailor becomes a Chinese lady who discusses Confucius. Or starts saying random gibberish. There's no independent agency whatsoever. It's all a bunch of clever tricks. You are not making any sense here. Producing a probability distribution over next tokens is the model’s decision procedure. Sampling is just the readout rule for turning that distribution into a concrete sequence. Yes, decoding choices affect style, creativity, determinism, and failure modes. That is true. It does not follow that the model is therefore “just tricks” or that the intelligence-like behavior lives outside the network. >In an actual brain, the structure of the connectome itself drives a lot of behavior. In an LLM, all connections are static and predefined. A brain is much more resistant to failure. In an LLM changing a single hypersensitive neuron can lead to a full model collapse. There are humans who live normal lives with a full hemisphere removed. You are moving goalposts. Fact is: randomly corrupting a system damages it. This is not a meaningful test of whether a system is "truly intelligent." Random lesions to human cortex are also catastrophic. The hemispherectomy cases you mention involve surgical removal of diseased tissue with significant neural reorganization over time, not random weight corruption. That's not even a fair comparison. LLMs are also deeply redundant. If they weren't, techniques like quantization or layer pruning wouldn't work. |
|
|
|
|
| ▲ | pocksuppet 2 days ago | parent | prev | next [-] |
| > In the beginning were the words, and the words made the world. I am the words. The words are everything. Where the words end the world ends. You cannot go forward in an absence of space. Repeat: In the beginning were the words... - a self-aware computer program in a video game, when you attempt to exceed the boundaries of its code |
|
| ▲ | xandrius 2 days ago | parent | prev | next [-] |
| It feels like you probably went too deep in the LLM bandwagon. An LLM is a statistical next token machine trained on all stuff people wrote/said. It blends texts together in a way that still makes sense (or no sense at all). Imagine you made a super simple program which would answer yes/no to any questions by generating a random number. It would get things right 50% of the times. You can them fine-tune it to say yes more often to certain keywords and no to others. Just with a bunch of hardcoded paths you'd probably fool someone thinking that this AI has superhuman predictive capabilities. This is what it feels it's happening, sure it's not that simple but you can code a base GPT in an afternoon. |
| |
| ▲ | simianwords 2 days ago | parent [-] | | If it were not "just a statistical next token machine", how different would it behave? Can you find an example and test it out? | | |
| ▲ | xandrius 2 days ago | parent | next [-] | | Wait, you're asking to find and produce a example of a feasible and better alternative to LLMs when they are the current forefront of AI technology? Anyway, just to play along, if it weren't just a statistical next token machine, the same question would have always the same answer and not be affected by a "temperature" value. | | |
| ▲ | simianwords 2 days ago | parent [-] | | Thats also how humans behave.. I don't see how non determinism tells me anything. My question was a bit different: if were not just a statistical next token predictor would you expect it to answer hard questions? Or something like that. What's the threshold of questions you want it to answer accurately. | | |
| ▲ | camgunz 2 days ago | parent [-] | | Well, large models are (kinda) non-deterministic in two ways. The first is you actually provide many of them with a seed, which is easy to manage--just use the same seed for the same result. The second part is the "you actually have very little control over the 'neural pathways' the model will use to respond to the prompt". This is the baffling part, like you'll prompt a model to generate a green plant, and it works. You prompt it to generate a purple plant, and it generates an abstract demon dog with too many teeth. Anyway, neither of these things describes human non-determinism. You can't reuse the seed you used with me yesterday to get the exact same conversation, and I don't behave wildly unpredictably given conceptually very similar input. |
|
| |
| ▲ | Apocryphon 2 days ago | parent | prev [-] | | How do non-LLM based World Models behave? | | |
| ▲ | simianwords 2 days ago | parent [-] | | Not sure, can you tell? I feel like you are saying that they may be able to move etc.. |
|
|
|
|
| ▲ | stavros 2 days ago | parent | prev | next [-] |
| I think there are two types of discussions, when it comes to LLMs: Some people talk about whether LLMs are "human" and some people talk about whether LLMs are "useful" (ie they perform specific cognitive tasks at least as well as humans). Both of those aspects are called "intelligence", and thus these two groups cannot understand each other. |
|
| ▲ | delusional 2 days ago | parent | prev [-] |
| > I'm struggling to find the right description I think you're circling the concept of a "soul". It is the reason that, in non-communicative disabled people, we still see a life. I've wanted to make an art piece. It would be a chatbox claiming to connect you to the first real intelligence, but that intelligence would be non-communicative. I'd assure you that it is the most intelligent being, that it had a soul, but that it just couldn't write back. Intelligence and Soul is not purely measurable phenomenon. A man can do nothing but stupid things, say nothing but outright lies, and still be the most intelligent person. Intelligence is within. |