Remix clone Hacker News

"There exists a generally accepted baseline definition for what crosses the threshold of intelligent behavior" not really. The whole point they are trying to make is that the capability of these models IS ALREADY muddying the definition of intelligence. We can't really test it because the distribution its learned is so vast. Hence why he have things like ARC now.

Even if its just gradient descent based distribution learning and there is no "internal system" (whatever you think that should look like) to support learning the distribution, the question is if that is more than what we are doing or if we are starting to replicate our own mechanisms of learning.

▲

jdhwosnhw a day ago | parent | next [-]

Peoples’ memories are so short. Ten years ago the “well accepted definition of intelligence” was whether something could pass the Turing test. Now that goalpost has been completely blown out of the water and people are scrabbling to come up with a new one that precludes LLMs.

A useful definition of intelligence needs to be measurable, based on inputs/outputs, not internal state. Otherwise you run the risk of dictating how you think intelligence should manifest, rather than what it actually is. The former is a prescription, only the latter is a true definition.

▲

fc417fc802 a day ago | parent | next [-]

I frequently see this characterization and can't agree with it. If I say "well I suppose you'd at least need to do A to qualify" and then later say "huh I guess A wasn't sufficient, looks like you'll also need B" that is not shifting the goalposts.

At worst it's an incomplete and ad hoc specification.

More realistically it was never more than an educated guess to begin with, about something that didn't exist at the time, still doesn't appear to exist, is highly subjective, lacks a single broadly accepted rigorous definition to this very day, and ultimately boils down to "I'll know it when I see it".

I'll know it when I see it, and I still haven't seen it. QED

▲

jdhwosnhw a day ago | parent [-]

> If I say "well I suppose you'd at least need to do A to qualify" and then later say "huh I guess A wasn't sufficient, looks like you'll also need B" that is not shifting the goalposts.

I dunno, that seems like a pretty good distillation of what moving the goalposts is.

> I’ll know it when I see it, and I haven’t seen it. QED

While pithily put, thats not a compelling argument. You feel that LLMs are not intelligent. I feel that they may be intelligent. Without a decent definition of what intelligence is, the entire argument is silly.

	▲	fc417fc802 21 hours ago \| parent \| next [-]
		Shifting goalposts usually (at least in my understanding) refers to changing something without valid justification that was explicitly set in a previous step (subjective wording I realize - this is off the top of my head). In an adversarial context it would be someone attempting to gain an advantage by subtly changing a premise in order to manipulate the conclusion. An incomplete list, in contrast, is not a full set of goalposts. It is more akin to a declared lower bound. I also don't think it to applies to the case where the parties are made aware of a change in circumstances and update their views accordingly. > You feel that LLMs are not intelligent. I feel that they may be intelligent. Weirdly enough I almost agree with you. LLMs have certainly challenged my notion of what intelligence is. At this point I think it's more a discussion of what sorts of things people are referring to when they use that word and if we can figure out an objective description that distinguishes those things from everything else. > Without a decent definition of what intelligence is, the entire argument is silly. I completely agree. My only objection is to the notion that goalposts have been shifted since in my view they were never established in the first place.
	▲	Jensson 17 hours ago \| parent \| prev [-]
		> I dunno, that seems like a pretty good distillation of what moving the goalposts is. Only if you don't understand what "the goalposts" means. The goalpost isn't "pass the turing test", the goalpost is "manage to do all the same kind of intellectual tasks that humans are", nobody has moved that since the start in the quest for AI.

▲

Retric 17 hours ago | parent | prev | next [-]

LLM’s can’t pass an unrestricted Touring test. LLM’s can mimic intelligence, but if you actually try and exploit their limitations the deception is still trivial to unmask.

Various chat bots have long been able to pass more limited versions of a Touring test. The most extreme constraint allows for simply replaying a canned conversation which with a helpful human assistant makes it indistinguishable from a human. But exploiting limitations on a testing format doesn’t have anything to do with testing for intelligence.

▲

travisjungroth a day ago | parent | prev [-]

I’ve realized while reading these comments my opinions on LLMs being intelligent has significantly increased. Rather than argue any specific test, I believe no one can come up with a text-based intelligence test that 90% of literate adults can pass but the top LLMs fail.

This would mean there’s no definition of intelligence you could tie to a test where humans would be intelligent but LLMs wouldn’t.

A maybe more palatable idea is that having “intelligence” as a binary is insufficient. I think it’s more of an extremely skewed distribution. With how humans are above the rest, you didn’t have to nail the cutoff point to get us on one side and everything else on the other. Maybe chimpanzees and dolphins slip in. But now, the LLMs are much closer to humans. That line is harder to draw. Actually not possible to draw it so people are on one side and LLMs on the other.

▲

fc417fc802 a day ago | parent | next [-]

Why presuppose that it's possible to test intelligence via text? Most humans have been illiterate for most of human history.

I don't mean to claim that it isn't possible, just that I'm not clear why we should assume that it is or that there would be an obvious way of going about it.

▲

travisjungroth a day ago | parent [-]

Seems pretty reasonable to presuppose this when you filter to people who are literate. That’s darn near a definition of literate, that you can engage with the text intelligently.

▲

fc417fc802 21 hours ago | parent [-]

I thought the definition of literate was "can interpret text in place of the spoken word". At which point it's worth noting that text is a much lower bandwidth channel than in person communication. Also worth noting that, ex, a mute person could still be considered intelligent.

Is it necessarily the case that you could discern general intelligence via a test with fixed structure, known to all parties in advance, carried out via a synthesized monotone voice? I'm not saying "you definitely can't do that" just that I don't see why we should a priori assume it to be possible.

Now that likely seems largely irrelevant and out in the weeds and normally I would feel that way. But if you're going to suppose that we can't cleanly differentiate LLMs from humans then it becomes important to ask if that's a consequence of the LLMs actually exhibiting what we would consider general intelligence versus an inherent limitation of the modality in which the interactions are taking place.

Personally I think it's far more likely that we just don't have very good tests yet, that our working definition of "general intelligence" (as well as just "intelligence") isn't all that great yet, and that in the end many humans who we consider to exhibit a reasonable level of such will nonetheless fail to pass tests that are based solely on an isolated exchange of natural language.

	▲	tsimionescu 18 hours ago \| parent [-]
		I generally agree with your framing, I'll just comment on a minor detail about what "literate" means. Typically, people are classed in three categories of literacy, not two: illiterate means you essentially can't read at all, literate means you can read and understand text to some level, but then there are people who are functionally illiterate - people who can read the letters and sound out text, but can't actively comprehend what they're reading to a level that allows them to function normally in society - say, being able to read and comprehend an email they receive at work or a news article. This difference between literate and functionally illiterate may have been what the poster above was referring to. Note that functional illiteracy is not some niche phenomenon, it's a huge problem in many school systems. In my own country (Romania), while the rate of illiteracy is something like <1% of the populace, the rate of functional illiteracy is estimated to be as high as 45% of those finishing school.

▲

nl a day ago | parent | prev [-]

Or maybe accept that LLMs are intelligent and it's human bias that is the oddity here.

	▲	travisjungroth a day ago \| parent [-]
		My whole comment was accepting LLMs as intelligent. It’s the first sentence.

▲

dingnuts a day ago | parent | prev [-]

How does an LLM muddy the definition of intelligence any more than a database or search engine does? They are lossy databases with a natural language interface, nothing more.

▲

tibbar a day ago | parent | next [-]

Ah, but what is in the database? At this point it's clearly not just facts, but problem-solving strategies and an execution engine. A database of problem-solving strategies which you can query with a natural language description of your problem and it returns an answer to your problem... well... sounds like intelligence to me.

▲

uoaei a day ago | parent [-]

> problem-solving strategies and an execution engine

Extremely unfounded claims. See: the root comment of this tree.

	▲	travisjungroth a day ago \| parent [-]
		…things that look like problem solving strategies in performance, then.

▲

madethisnow a day ago | parent | prev [-]

datasets and search engines are deterministic. humans, and llms are not.

▲

semiquaver a day ago | parent | next [-]

LLMs are completely deterministic. Their fundamental output is a vector representing a probability distribution of the next token given the model weights and context. Given the same inputs an identical output vector will be produced 100% of the time.

This fact is relied upon by for example https://bellard.org/ts_zip/ a lossless compression system that would not work if LLMs were nondeterministic.

In practice most LLM systems use this distribution (along with a “temperature” multiplier) to make a weighted random choice among the tokens, giving the illusion of nondeterminism. But there’s no fundamental reason you couldn’t for example always choose the most likely token, yielding totally deterministic output.

This is an excellent and accessible series going over how transformer systems work if you want to learn more. https://youtu.be/wjZofJX0v4M

	▲	frozenseven 20 hours ago \| parent \| next [-]
		>In practice most LLM systems use this distribution (along with a “temperature” multiplier) to make a weighted random choice among the tokens In other words, LLMs are not deterministic in just about any real setting. What you said there only compounds with MoE architectures, variable test-time compute allocation, and o3-like sampling.
	▲	spunker540 21 hours ago \| parent \| prev [-]
		i've heard it actually depends on the model / hosting architecture. some are not deterministic at the numeric level because there is so much floating point math going on in distributed fashion across gpus, with unpredictable rounding/syncing across machines

▲

hatefulmoron a day ago | parent | prev | next [-]

The LLM's output is chaotic relative to the input, but it's deterministic right? Same settings, same model, same input, .. same output? Where does the chain get broken here?

▲

tsimionescu 17 hours ago | parent | next [-]

Depends on what you mean specifically by the output. The actual neural network will produce deterministic outputs that could be interpreted as probability values for various tokens. But the interface you'll commonly see used in front of these models will then non-deterministiclaly choose a single next token to output based on those probabilities. Then, this single randomly chosen output is fed back into the network to produce another token, and this process repeats.

I would ultimately call the result non-deterministic. You could make it deterministic relatively easily by having a deterministic process for choosing a single token from all of the outputs of the NN (say, always pick the one with the highest weight, and if there are multiple with the same weight, pick the first one in token index order), but no one normally does this, because the results aren't that great per my understanding.

	▲	fc417fc802 15 hours ago \| parent [-]
		You can have the best of both worlds with something like weighted_selection( output, hash( output ) ) using the hash as the PRNG seed. (If you're paranoid about statistical issues due to identical outputs (extremely unlikely) then add a nonce to the hash.)

▲

fc417fc802 a day ago | parent | prev [-]

Now compare a human to an LSTM with persistent internal state that you can't reset.

	▲	19 hours ago \| parent [-]
		[deleted]

▲

daveguy a day ago | parent | prev [-]

The only reason LLMs are stochastic instead of deterministic is a random number generator. There is nothing inherently non-deterministic about LLM algorithms unless you turn up the "temperature" of selecting the next word. The fact that determinism can be changed by turning a knob is clear evidence that they are closer to a database or search engine than a human.

▲

travisjungroth a day ago | parent [-]

You can turn the determinism knob on humans. Psychedelics are one method.

	▲	mrob a day ago \| parent [-]
		I think that's more adjusting the parameters of the built-in denoising and feature detection circuits of the inherently noisy analog computer that is the brain.