The impactful innovations in AI these days aren't really from scaling models to be larger. It's more concrete to show higher benchmark scores, and this implies higher intelligence, but this higher intelligence doesn't necessarily translate to all users feeling like the model has significantly improved for their use case. Models sometimes still struggle with simple questions like counting letters in a word, and most people don't have a use case of a model needing phd level research ability.

Research now matters more than scaling when research can fix limitations that scaling alone can't. I'd also argue that we're in the age of product where the integration of product and models play a major role in what they can do combined.

▲

pron 11 hours ago | parent | next [-]

> this implies higher intelligence

Not necessarily. The problem is that we can't precisely define intelligence (or, at least, haven't so far), and we certainly can't (yet?) measure it directly. And so what we have are certain tests whose scores, we believe, are correlated with that vague thing we call intelligence in humans. Except these test scores can correlate with intelligence (whatever it is) in humans and at the same time correlate with something that's not intelligence in machines. So a high score may well imply high intellignce in humans but not in machines (e.g. perhaps because machine models may overfit more than a human brain does, and so an intelligence test designed for humans doesn't necessarily measure the same thing we think of when we say "intelligence" when applied to a machine).

This is like the following situation: Imagine we have some type of signal, and the only process we know produces that type of signal is process A. Process A always produces signals that contain a maximal frequency of X Hz. We devise a test for classifying signals of that type that is based on sampling them at a frequency of 2X Hz. Then we discover some process B that produces a similar type of signal, and we apply the same test to classify its signals in a similar way. Only, process B can produce signals containing a maximal frequency of 10X Hz and so our test is not suitable for classifying the signals produced by process B (we'll need a different test that samples at 20X Hz).

▲

matu3ba 10 hours ago | parent | next [-]

My definition of intelligence is the capability to process and formalize a deterministic action from given inputs as transferable entity/medium. In other words knowing how to manipulate the world directly and indirectly via deterministic actions and known inputs and teach others via various mediums. As example, you can be very intelligent at software programming, but socially very dumb (for example unable to socially influence others).

As example, if you do not understand another person (in language) and neither understand the person's work or it's influence, then you would have no assumption on the person's intelligence outside of your context what you assume how smart humans are.

ML/AI for text inputs is stochastic at best for context windows with language or plain wrong, so it does not satisfy the definition. Well (formally) specified with smaller scope tend to work well from what I've seen so far. Known to me working ML/AI problems are calibration/optimization problems.

What is your definition?

	▲	pron 10 hours ago \| parent [-]
		> My definition of intelligence is the capability to process and formalize a deterministic action from given inputs as transferable entity/medium. I don't think that's a good definition because many deterministic processes - including those at the core of important problems, such as those pertaining to the economy - are highly non-linear and we don't necessarily think that "more intelligence" is what's needed to simulate them better. I mean, we've proven that predicting certain things (even those that require nothing but deduction) require more computational resources regardless of the algorithm used for the prediction. Formalising a process, i.e. inferring the rules from observation through induction, may also be dependent on available computational resources. > What is your definition? I don't have one except for "an overall quality of the mental processes humans present more than other animals".

▲

alyxya 11 hours ago | parent | prev [-]

Fair, I think it would be more appropriate to say higher capacity.

	▲	pron 11 hours ago \| parent [-]
		Ok, but the point of a test of this kind is to generalise its result. I.e. the whole point of an intelligence test is that we believe that a human getting a high score on such a test is more likely to do some useful things not on the test better than a human with a low score. But if the problem is that the test results - as you said - don't generalise as we expect them, then the tests are not very meaningful to begin with. If we don't know what to expect from a machine with a high test score when it comes to doing things not on the test, then the only "capacity" we're measuring is the capacity to do well on such tests, and that's not very useful.

▲

TheBlight 11 hours ago | parent | prev | next [-]

"Scaling" is going to eventually apply to the ability to run more and higher fidelity simulations such that AI can run experiments and gather data about the world as fast and as accurately as possible. Pre-training is mostly dead. The corresponding compute spend will be orders of magnitude higher.

▲

alyxya 11 hours ago | parent [-]

That's true, I expect more inference time scaling and hybrid inference/training time scaling when there's continual learning rather than scaling model size or pretraining compute.

	▲	TheBlight 11 hours ago \| parent [-]
		Simulation scaling will be the most insane though. Simulating "everything" at the quantum level is impossible and the vast majority of new learning won't require anything near that. But answers to the hardest questions will require as close to it as possible so it will be tried. Millions upon millions of times. It's hard to imagine.

▲

nutjob2 11 hours ago | parent | prev | next [-]

> this implies higher intelligence

Models aren't intelligent, the intelligence is latent in the text (etc) that the model ingests. There is no concrete definition of intelligence, only that humans have it (in varying degrees).

The best you can really state is that a model extracts/reveals/harnesses more intelligence from its training data.

▲

darkmighty 11 hours ago | parent | next [-]

There is no concrete definition of a chair either.

	▲	gafferongames 5 hours ago \| parent [-]
		And yet I'm sitting in one

▲

dragonwriter 11 hours ago | parent | prev [-]

> There is no concrete definition of intelligence

Note that if this is true (and it is!) all the other statements about intelligence and where it is and isn’t found in the post (and elsewhere) are meaningless.

	▲	interstice 9 hours ago \| parent [-]
		I did notice that, the person you replied to made a categorical statement about intelligence followed immediately with negating that there is anything to make a concrete statement about.

▲

jfim 10 hours ago | parent | prev | next [-]

Counting letters is tricky for LLMs because they operate on tokens, not letters. From the perspective of a LLM, if you ask it "this is a sentence, count the letters in it" it doesn't see a stream of characters like we do, it sees [851, 382, 261, 21872, 11, 3605, 290, 18151, 306, 480].

▲

tintor 7 hours ago | parent [-]

So what? It knows number of letters in each token, and can sum them together.

▲

fzzzy 6 hours ago | parent [-]

How does it know the letters in the token?

It doesn't.

There's literally no mapping anywhere of the letters in a token.

	▲	ACCount37 an hour ago \| parent \| next [-]
		There is a mapping. An internal, fully learned mapping that's derived from seeing misspellings and words spelled out letter by letter. Some models make it an explicit part of the training with subword regularization, but many don't. It's hard to access that mapping though. A typical LLM can semi-reliably spell common words out letter by letter - but it can't say how many of each are in a single word immediately. But spelling the word out first and THEN counting the letters? That works just fine.
	▲	danielscrubs 5 hours ago \| parent \| prev [-]
		If it did frequency analysis then I would consider it having a PhD level intelligence, not just a PhD level of knowledge (like a dictionary).

▲

pessimizer 11 hours ago | parent | prev [-]

> most people don't have a use case of a model needing phd level research ability.

Models also struggle at not fabricating references or entire branches of science.

edit: "needing phd level research ability [to create]"?