| ▲ | no_wizard a day ago |
| That's not at all on par with what I'm saying. There exists a generally accepted baseline definition for what crosses the threshold of intelligent behavior. We shouldn't seek to muddy this. EDIT: Generally its accepted that a core trait of intelligence is an agent’s ability to achieve goals in a wide range of environments. This means you must be able to generalize, which in turn allows intelligent beings to react to new environments and contexts without previous experience or input. Nothing I'm aware of on the market can do this. LLMs are great at statistically inferring things, but they can't generalize which means they lack reasoning. They also lack the ability to seek new information without prompting. The fact that all LLMs boil down to (relatively) simple mathematics should be enough to prove the point as well. It lacks spontaneous reasoning, which is why the ability to generalize is key |
|
| ▲ | byearthithatius a day ago | parent | next [-] |
| "There exists a generally accepted baseline definition for what crosses the threshold of intelligent behavior" not really. The whole point they are trying to make is that the capability of these models IS ALREADY muddying the definition of intelligence. We can't really test it because the distribution its learned is so vast. Hence why he have things like ARC now. Even if its just gradient descent based distribution learning and there is no "internal system" (whatever you think that should look like) to support learning the distribution, the question is if that is more than what we are doing or if we are starting to replicate our own mechanisms of learning. |
| |
| ▲ | jdhwosnhw a day ago | parent | next [-] | | Peoples’ memories are so short. Ten years ago the “well accepted definition of intelligence” was whether something could pass the Turing test. Now that goalpost has been completely blown out of the water and people are scrabbling to come up with a new one that precludes LLMs. A useful definition of intelligence needs to be measurable, based on inputs/outputs, not internal state. Otherwise you run the risk of dictating how you think intelligence should manifest, rather than what it actually is. The former is a prescription, only the latter is a true definition. | | |
| ▲ | fc417fc802 a day ago | parent | next [-] | | I frequently see this characterization and can't agree with it. If I say "well I suppose you'd at least need to do A to qualify" and then later say "huh I guess A wasn't sufficient, looks like you'll also need B" that is not shifting the goalposts. At worst it's an incomplete and ad hoc specification. More realistically it was never more than an educated guess to begin with, about something that didn't exist at the time, still doesn't appear to exist, is highly subjective, lacks a single broadly accepted rigorous definition to this very day, and ultimately boils down to "I'll know it when I see it". I'll know it when I see it, and I still haven't seen it. QED | | |
| ▲ | jdhwosnhw a day ago | parent [-] | | > If I say "well I suppose you'd at least need to do A to qualify" and then later say "huh I guess A wasn't sufficient, looks like you'll also need B" that is not shifting the goalposts. I dunno, that seems like a pretty good distillation of what moving the goalposts is. > I’ll know it when I see it, and I haven’t seen it. QED While pithily put, thats not a compelling argument. You feel that LLMs are not intelligent. I feel that they may be intelligent. Without a decent definition of what intelligence is, the entire argument is silly. | | |
| ▲ | fc417fc802 21 hours ago | parent | next [-] | | Shifting goalposts usually (at least in my understanding) refers to changing something without valid justification that was explicitly set in a previous step (subjective wording I realize - this is off the top of my head). In an adversarial context it would be someone attempting to gain an advantage by subtly changing a premise in order to manipulate the conclusion. An incomplete list, in contrast, is not a full set of goalposts. It is more akin to a declared lower bound. I also don't think it to applies to the case where the parties are made aware of a change in circumstances and update their views accordingly. > You feel that LLMs are not intelligent. I feel that they may be intelligent. Weirdly enough I almost agree with you. LLMs have certainly challenged my notion of what intelligence is. At this point I think it's more a discussion of what sorts of things people are referring to when they use that word and if we can figure out an objective description that distinguishes those things from everything else. > Without a decent definition of what intelligence is, the entire argument is silly. I completely agree. My only objection is to the notion that goalposts have been shifted since in my view they were never established in the first place. | |
| ▲ | Jensson 17 hours ago | parent | prev [-] | | > I dunno, that seems like a pretty good distillation of what moving the goalposts is. Only if you don't understand what "the goalposts" means. The goalpost isn't "pass the turing test", the goalpost is "manage to do all the same kind of intellectual tasks that humans are", nobody has moved that since the start in the quest for AI. |
|
| |
| ▲ | Retric 17 hours ago | parent | prev | next [-] | | LLM’s can’t pass an unrestricted Touring test. LLM’s can mimic intelligence, but if you actually try and exploit their limitations the deception is still trivial to unmask. Various chat bots have long been able to pass more limited versions of a Touring test. The most extreme constraint allows for simply replaying a canned conversation which with a helpful human assistant makes it indistinguishable from a human. But exploiting limitations on a testing format doesn’t have anything to do with testing for intelligence. | |
| ▲ | travisjungroth a day ago | parent | prev [-] | | I’ve realized while reading these comments my opinions on LLMs being intelligent has significantly increased. Rather than argue any specific test, I believe no one can come up with a text-based intelligence test that 90% of literate adults can pass but the top LLMs fail. This would mean there’s no definition of intelligence you could tie to a test where humans would be intelligent but LLMs wouldn’t. A maybe more palatable idea is that having “intelligence” as a binary is insufficient. I think it’s more of an extremely skewed distribution. With how humans are above the rest, you didn’t have to nail the cutoff point to get us on one side and everything else on the other. Maybe chimpanzees and dolphins slip in. But now, the LLMs are much closer to humans. That line is harder to draw. Actually not possible to draw it so people are on one side and LLMs on the other. | | |
| ▲ | fc417fc802 a day ago | parent | next [-] | | Why presuppose that it's possible to test intelligence via text? Most humans have been illiterate for most of human history. I don't mean to claim that it isn't possible, just that I'm not clear why we should assume that it is or that there would be an obvious way of going about it. | | |
| ▲ | travisjungroth a day ago | parent [-] | | Seems pretty reasonable to presuppose this when you filter to people who are literate. That’s darn near a definition of literate, that you can engage with the text intelligently. | | |
| ▲ | fc417fc802 21 hours ago | parent [-] | | I thought the definition of literate was "can interpret text in place of the spoken word". At which point it's worth noting that text is a much lower bandwidth channel than in person communication. Also worth noting that, ex, a mute person could still be considered intelligent. Is it necessarily the case that you could discern general intelligence via a test with fixed structure, known to all parties in advance, carried out via a synthesized monotone voice? I'm not saying "you definitely can't do that" just that I don't see why we should a priori assume it to be possible. Now that likely seems largely irrelevant and out in the weeds and normally I would feel that way. But if you're going to suppose that we can't cleanly differentiate LLMs from humans then it becomes important to ask if that's a consequence of the LLMs actually exhibiting what we would consider general intelligence versus an inherent limitation of the modality in which the interactions are taking place. Personally I think it's far more likely that we just don't have very good tests yet, that our working definition of "general intelligence" (as well as just "intelligence") isn't all that great yet, and that in the end many humans who we consider to exhibit a reasonable level of such will nonetheless fail to pass tests that are based solely on an isolated exchange of natural language. | | |
| ▲ | tsimionescu 18 hours ago | parent [-] | | I generally agree with your framing, I'll just comment on a minor detail about what "literate" means. Typically, people are classed in three categories of literacy, not two: illiterate means you essentially can't read at all, literate means you can read and understand text to some level, but then there are people who are functionally illiterate - people who can read the letters and sound out text, but can't actively comprehend what they're reading to a level that allows them to function normally in society - say, being able to read and comprehend an email they receive at work or a news article. This difference between literate and functionally illiterate may have been what the poster above was referring to. Note that functional illiteracy is not some niche phenomenon, it's a huge problem in many school systems. In my own country (Romania), while the rate of illiteracy is something like <1% of the populace, the rate of functional illiteracy is estimated to be as high as 45% of those finishing school. |
|
|
| |
| ▲ | nl a day ago | parent | prev [-] | | Or maybe accept that LLMs are intelligent and it's human bias that is the oddity here. | | |
|
| |
| ▲ | dingnuts a day ago | parent | prev [-] | | How does an LLM muddy the definition of intelligence any more than a database or search engine does? They are lossy databases with a natural language interface, nothing more. | | |
| ▲ | tibbar a day ago | parent | next [-] | | Ah, but what is in the database? At this point it's clearly not just facts, but problem-solving strategies and an execution engine. A database of problem-solving strategies which you can query with a natural language description of your problem and it returns an answer to your problem... well... sounds like intelligence to me. | | |
| ▲ | uoaei a day ago | parent [-] | | > problem-solving strategies and an execution engine Extremely unfounded claims. See: the root comment of this tree. | | |
| |
| ▲ | madethisnow a day ago | parent | prev [-] | | datasets and search engines are deterministic. humans, and llms are not. | | |
| ▲ | semiquaver a day ago | parent | next [-] | | LLMs are completely deterministic. Their fundamental output is a vector representing a probability distribution of the next token given the model weights and context. Given the same inputs an identical output vector will be produced 100% of the time. This fact is relied upon by for example https://bellard.org/ts_zip/ a lossless compression system that would not work if LLMs were nondeterministic. In practice most LLM systems use this distribution (along with a “temperature” multiplier) to make a weighted random choice among the tokens, giving the illusion of nondeterminism. But there’s no fundamental reason you couldn’t for example always choose the most likely token, yielding totally deterministic output. This is an excellent and accessible series going over how transformer systems work if you want to learn more. https://youtu.be/wjZofJX0v4M | | |
| ▲ | frozenseven 20 hours ago | parent | next [-] | | >In practice most LLM systems use this distribution (along with a “temperature” multiplier) to make a weighted random choice among the tokens In other words, LLMs are not deterministic in just about any real setting. What you said there only compounds with MoE architectures, variable test-time compute allocation, and o3-like sampling. | |
| ▲ | spunker540 21 hours ago | parent | prev [-] | | i've heard it actually depends on the model / hosting architecture. some are not deterministic at the numeric level because there is so much floating point math going on in distributed fashion across gpus, with unpredictable rounding/syncing across machines |
| |
| ▲ | hatefulmoron a day ago | parent | prev | next [-] | | The LLM's output is chaotic relative to the input, but it's deterministic right? Same settings, same model, same input, .. same output? Where does the chain get broken here? | | |
| ▲ | tsimionescu 18 hours ago | parent | next [-] | | Depends on what you mean specifically by the output. The actual neural network will produce deterministic outputs that could be interpreted as probability values for various tokens. But the interface you'll commonly see used in front of these models will then non-deterministiclaly choose a single next token to output based on those probabilities. Then, this single randomly chosen output is fed back into the network to produce another token, and this process repeats. I would ultimately call the result non-deterministic. You could make it deterministic relatively easily by having a deterministic process for choosing a single token from all of the outputs of the NN (say, always pick the one with the highest weight, and if there are multiple with the same weight, pick the first one in token index order), but no one normally does this, because the results aren't that great per my understanding. | | |
| ▲ | fc417fc802 16 hours ago | parent [-] | | You can have the best of both worlds with something like weighted_selection( output, hash( output ) ) using the hash as the PRNG seed. (If you're paranoid about statistical issues due to identical outputs (extremely unlikely) then add a nonce to the hash.) |
| |
| ▲ | fc417fc802 a day ago | parent | prev [-] | | Now compare a human to an LSTM with persistent internal state that you can't reset. | | |
| |
| ▲ | daveguy a day ago | parent | prev [-] | | The only reason LLMs are stochastic instead of deterministic is a random number generator. There is nothing inherently non-deterministic about LLM algorithms unless you turn up the "temperature" of selecting the next word. The fact that determinism can be changed by turning a knob is clear evidence that they are closer to a database or search engine than a human. | | |
| ▲ | travisjungroth a day ago | parent [-] | | You can turn the determinism knob on humans. Psychedelics are one method. | | |
| ▲ | mrob a day ago | parent [-] | | I think that's more adjusting the parameters of the built-in denoising and feature detection circuits of the inherently noisy analog computer that is the brain. |
|
|
|
|
|
|
| ▲ | david-gpu a day ago | parent | prev | next [-] |
| > There exists a generally accepted baseline definition for what crosses the threshold of intelligent behavior. Go on. We are listening. |
|
| ▲ | nmarinov a day ago | parent | prev | next [-] |
| I think the confusion is because you're referring to a common understanding of what AI is but I think the definition of AI is different for different people. Can you give your definition of AI? Also what is the "generally accepted baseline definition for what crosses the threshold of intelligent behavior"? |
|
| ▲ | voidspark a day ago | parent | prev | next [-] |
| You are doubling down on a muddled vague non-technical intuition about these terms. Please tell us what that "baseline definition" is. |
|
| ▲ | appleorchard46 a day ago | parent | prev | next [-] |
| > Generally its accepted that a core trait of intelligence is an agent’s ability to achieve goals in a wide range of environments. Be that as it may, a core trait is very different from a generally accepted threshold. What exactly is the threshold? Which environments are you referring to? How is it being measured? What goals are they? You may have quantitative and unambiguous answers to these questions, but I don't think they would be commonly agreed upon. |
|
| ▲ | highfrequency a day ago | parent | prev | next [-] |
| What is that baseline threshold for intelligence? Could you provide concrete and objective results, that if demonstrated by a computer system would satisfy your criteria for intelligence? |
| |
| ▲ | no_wizard a day ago | parent [-] | | see the edit. boils down to the ability to generalize, LLMs can't generalize. I'm not the only one who holds this view either. Francois Chollet, a former intelligence researcher at Google also shares this view. | | |
| ▲ | highfrequency a day ago | parent | next [-] | | Are you able to formulate "generalization" in a concrete and objective way that could be achieved unambiguously, and is currently achieved by a typical human? A lot of people would say that LLMs generalize pretty well - they certainly can understand natural language sequences that are not present in their training data. | | |
| ▲ | whilenot-dev 13 hours ago | parent [-] | | > A lot of people would say that LLMs generalize pretty well What do you mean here? The trained model, the inference engine, is the one that makes an LLM for "a lot of people". > they certainly can understand natural language sequences that are not present in their training data Keeping the trained model as LLM in mind, I think learning a language includes generalization and is typically achieved by a human, so I'll try to formulate: Can a trained LLM model learn languages that hasn't been in its training set just by chatting/prompting? Given that any Korean texts were excluded from the training set, could Korean be learned? Does that even work with languages descending from the same language family (Spanish in the training set but Italian should be learned)? |
| |
| ▲ | voidspark a day ago | parent | prev | next [-] | | Chollet's argument was that it's not "true" generalization, which would be at the level of human cognition. He sets the bar so high that it becomes a No True Scotsman fallacy. The deep neural networks are practically generalizing well enough to solve many tasks better than humans. | | |
| ▲ | daveguy a day ago | parent [-] | | No. His argument is definitely closer to LLMs can't generalize. I think you would benefit from re-reading the paper. The point is that a puzzle consisting of simple reasoning about simple priors should be a fairly low bar for "intelligence" (necessary but not sufficient). LLMs performs abysmally because they have a very specific purpose trained goal that is different from solving the ARC puzzles. Humans solve these easily. And committees of humans do so perfectly. If LLMs were intelligent they would be able to construct algorithms consisting of simple applications of the priors. Training to a specific task and getting better is completely orthogonal to generalized search and application of priors. Humans do a mix of both search of the operations and pattern matching of recognizing the difference between start and stop state. That is because their "algorithm" is so general purpose. And we have very little idea how the two are combined efficiently. At least this is how I interpreted the paper. | | |
| ▲ | voidspark a day ago | parent [-] | | He is setting a bar, saying that that is the "true" generalization. Deep neural networks are definitely performing generalization at a certain level that beats humans at translation or Go, just not at his ARC bar. He may not think it's good enough, but it's still generalization whether he likes it or not. | | |
| ▲ | fc417fc802 a day ago | parent [-] | | I'm not convinced either of your examples is generalization. Consider Go. I don't consider a procedural chess engine to be "generalized" in any sense yet a decent one can easily beat any human. Why then should Go be different? | | |
| ▲ | voidspark a day ago | parent [-] | | A procedural chess engine does not perform generalization, in ML terms. That is an explicitly programmed algorithm. Generalization has a specific meaning in the context of machine learning. The AlphaGo Zero model learned advanced strategies of the game, starting with only the basic rules of the game, without being programmed explicitly. That is generalization. | | |
| ▲ | fc417fc802 a day ago | parent [-] | | Perhaps I misunderstand your point but it seems to me that by the same logic a simple gradient descent algorithm wired up to a variety of different models and simulations would qualify as generalization during the training phase. The trouble with this is that it only ever "generalizes" approximately as far as the person configuring the training run (and implementing the simulation and etc) ensures that it happens. In which case it seems analogous to an explicitly programmed algorithm to me. Even if we were to accept the training phase as a very limited form of generalization it still wouldn't apply to the output of that process. The trained LLM as used for inference is no longer "learning". The point I was trying to make with the chess engine was that it doesn't seem that generalization is required in order to perform that class of tasks (at least in isolation, ie post-training). Therefore, it should follow that we can't use "ability to perform the task" (ie beat a human at that type of board game) as a measure for whether or not generalization is occurring. Hypothetically, if you could explain a novel rule set to a model in natural language, play a series of several games against it, and following that it could reliably beat humans at that game, that would indeed be a type of generalization. However my next objection would then be, sure, it can learn a new turn based board game, but if I explain these other five tasks to it that aren't board games and vary widely can it also learn all of those in the same way? Because that's really what we seem to mean when we say that humans or dogs or dolphins or whatever possess intelligence in a general sense. | | |
| ▲ | voidspark 21 hours ago | parent [-] | | You're muddling up some technical concepts here in a very confusing way. Generalization is the ability for a model to perform well on new unseen data within the same task that it was trained for. It's not about the training process itself. Suppose I showed you some examples of multiplication tables, and you figured out how to multiply 19 * 42 without ever having seen that example before. That is generalization. You have recognized the underlying pattern and applied it to a new case. AlphaGo Zero trained on games that it generated by playing against itself, but how that data was generated is not the point. It was able to generalize from that information to learn deeper principles of the game to beat human players. It wasn't just memorizing moves from a training set. > However my next objection would then be, sure, it can learn a new turn based board game, but if I explain these other five tasks to it that aren't board games and vary widely can it also learn all of those in the same way? Because that's really what we seem to mean when we say that humans or dogs or dolphins or whatever possess intelligence in a general sense. This is what LLMs have already demonstrated - a rudimentary form of AGI. They were originally trained for language translation and a few other NLP tasks, and then we found they have all these other abilities. | | |
| ▲ | fc417fc802 20 hours ago | parent [-] | | > Generalization is the ability for a model to perform well on new unseen data within the same task that it was trained for. By that logic a chess engine can generalize in the same way that AlphaGo Zero does. It is a black box that has never seen the vast majority of possible board positions. In fact it's never seen anything at all because unlike an ML model it isn't the result of an optimization algorithm (at least the old ones, back before they started incorporating ML models). If your definition of "generalize" depends on "is the thing under consideration an ML model or not" then the definition is broken. You need to treat the thing being tested as a black box, scoring only based on inputs and outputs. Writing the chess engine is analogous to wiring up the untrained model, the optimization algorithm, and the simulation followed by running it. Both tasks require thoughtful work by the developer. The finished chess engine is analogous to the trained model. > They were originally trained for ... I think you're in danger here of a definition that depends intimately on intent. It isn't clear that they weren't inadvertently trained for those other abilities at the same time. Moreover, unless those additional abilities to be tested for were specified ahead of time you're deep into post hoc territory. | | |
| ▲ | voidspark 19 hours ago | parent [-] | | You're way off. This is not my personal definition of generalization. We are talking about a very specific technical term in the context of machine learning. An explicitly programmed chess engine does not generalize, by definition. It doesn't learn from data. It is an explicitly programmed algorithm. I recommend you go do some reading about machine learning basics. https://www.cs.toronto.edu/~lczhang/321/notes/notes09.pdf | | |
| ▲ | fc417fc802 16 hours ago | parent | next [-] | | I thought we were talking about metrics of intelligence. Regardless, the terminology overlaps. As far as metrics of intelligence go, the algorithm is a black box. We don't care how it works or how it was constructed. The only thing we care about is (something like) how well it performs across an array of varied tasks that it hasn't encountered before. That is to say, how general the black box is. Notice that in the case of typical ML algorithms the two usages are equivalent. If the approach generalizes (from training) then the resulting black box would necessarily be assessed as similarly general. So going back up the thread a ways. Someone quotes Chollet as saying that LLMs can't generalize. You object that he sets the bar too high - that, for example, they generalize just fine at Go. You can interpret that using either definition. The result is the same. As far as measuring intelligence is concerned, how is "generalizes on the task of Go" meaningfully better than a procedural chess engine? If you reject the procedural chess engine as "not intelligent" then it seems to me that you must also reject an ML model that does nothing but play Go. > An explicitly programmed chess engine does not generalize, by definition. It doesn't learn from data. It is an explicitly programmed algorithm. Following from above, I don't see the purpose of drawing this distinction in context since the end result is the same. Sure, without a training task you can't compare performance between the training run and something else. You could use that as a basis to exclude entire classes of algorithms, but to what end? | | |
| ▲ | voidspark 5 hours ago | parent [-] | | We still have this mixup with the term "generalize". ML generalization is not the same as "generalness". The model learns from data to infer strategies for its task (generalization). This is a completely different paradigm to an explicitly programmed rules engine which does not learn and cannot generalize. |
| |
| ▲ | daveguy 9 hours ago | parent | prev [-] | | If you are using the formal definition of generalization in a machine learning context, then you completely misrepresented Chollet's claims. He doesn't say much about generalization in the sense of in-distribution, unseen data. Any AI algorithm worth a damn can do that to some degree. His argument is about transfer learning, which is simply a more robust form of generalization to out-of-distribution data. A network trained on Go cannot generalize to translation and vice versa. Maybe you should stick to a single definition of "generalization" and make that definition clear before you accuse people of needing to read ML basics. | | |
| ▲ | voidspark 5 hours ago | parent [-] | | I was replying to a claim that LLMs "can’t generalize" at all, and I showed they do within their domain. No I haven't completely misrepresented the claims. Chollet is just setting a high bar for generalization. |
|
|
|
|
|
|
|
|
|
| |
| ▲ | stevenAthompson a day ago | parent | prev [-] | | > Francois Chollet, a former intelligence researcher at Google also shares this view. Great, now there are two of you. |
|
|
|
| ▲ | aj7 a day ago | parent | prev | next [-] |
| LLM’s are statistically great at inferring things?
Pray tell me how often Google’s AI search paragraph, at the top, is correct or useful. Is that statistically great? |
|
| ▲ | nl a day ago | parent | prev | next [-] |
| > Generally its accepted that a core trait of intelligence is an agent’s ability to achieve goals in a wide range of environments. This is the embodiment argument - that intelligence requires the ability to interact with its environment. Far from being generally accepted, it's a controversial take. Could Stephen Hawking achieve goals in a wide range of environments without help? And yet it's still generally accepted that Stephen Hawking was intelligent. |
|
| ▲ | nurettin a day ago | parent | prev [-] |
| > intelligence is an agent’s ability to achieve goals in a wide range of environments. This means you must be able to generalize, which in turn allows intelligent beings to react to new environments and contexts without previous experience or input. I applaud the bravery of trying to one shot a definition of intelligence, but no intelligent being acts without previous experience or input. If you're talking about in-sample vs out of sample, LLMs do that all the time. At some point in the conversation, they encounter something completely new and react to it in a way that emulates an intelligent agent. What really makes them tick is language being a huge part of the intelligence puzzle, and language is something LLMs can generate at will. When we discover and learn to emulate the rest, we will get closer and closer to super intelligence. |