LLMs are just really good search. Ask it to create something and it's searching within the pretrained weights. Ask it to find something and it's semantically searching within your codebase. Ask it to modify something and it will do both. Once you understand its just search, you can get really good results.

▲ fennecbutt 3 hours ago | parent | next [-]

I agree somewhat, but more when it comes to its use of logic - it only gleans logic from human language which as we know is a fucking mess.

I've commented before on my belief that the majority of human activity is derivative. If you ask someone to think of a new kind of animal, alien or random object they will always base it off things that they have seen before. Truly original thoughts and things in this world are an absolute rarity and the majority of supposed original thought riffs on what we see others make, and those people look to nature and the natural world for inspiration.

We're very good at taking thing a and thing b and slapping them together and announcing we've made something new. Someone please reply with a wholly original concept. I had the same issue recently when trying to build a magic based physics system for a game I was thinking of prototyping.

▲ andy99 2 hours ago | parent | next [-]

  it only gleans logic from human language

This isn’t really true, at least how I interpret the statement, little if any of the “logic” or appearance of such is learned from language. It’s trained in with reinforcement learning as pattern recognition.

Point being it’s deliberate training, not just some emergent property of language modeling. Not sure if the above post meant this, but it does seem a common misconception.

▲ onemoresoop 3 hours ago | parent | prev [-]

LLMs lack agency in the sense that they have no goals, preferences, or commitments. Humans do, even when our ideas are derivative. We can decide that this is the right choice and move forward, subjectively and imperfectly. That capacity to commit under uncertainty is part of what agency actually is.

	▲	MrOrelliOReilly an hour ago \| parent [-]
		But they do have utility functions, which one can interpret as nearly equivalent

▲ bhadass 5 hours ago | parent | prev | next [-]

better mental model: it's a lossy compression of human knowledge that can decompress and recombine in novel (sometimes useful, sometimes sloppy) ways.

classical search simply retrieves, llms can synthesize as well.

▲

RhythmFox 4 hours ago | parent | next [-]

This isn't strictly better to me. It captures some intuitions about how a neural network ends up encoding its inputs over time in a 'lossy' way (doesn't store previous input states in an explicit form). Maybe saying 'probabilistic compression/decompression' makes it a bit more accurate? I do not really think it connects to your 'synthesize' claim at the very end to call it compression/decompression, but I am curious if you had a specific reason to use the term.

	▲	XenophileJKO 3 hours ago \| parent [-]
		It's really way more interesting that that. The act of compression builds up behaviors/concepts of greater and greater abstraction. Another way you could think about it is that the model learns to extract commonality, hence the compression. What this means is because it is learning higher level abstractions AND the relationships between these higher level abstractions, it can ABSOLUTELY learn to infer or apply things way outside their training distribution.

▲

andy99 3 hours ago | parent | prev | next [-]

No, this describes the common understanding of LLMs and adds little to just calling it AI. The search is the more accurate model when considering their actual capabilities and understanding weaknesses. “Lossy compression of human knowledge” is marketing.

▲

XenophileJKO 3 hours ago | parent [-]

It is fundamentally and provably different than search because it captures things on two dimensions that can be used combinatorially to infer desired behavior for unobserved examples.

1. Conceptual Distillation - Proven by research work that we can find weights that capture/influence outputs that align with higher level concepts.

2. Conceptual Relations - The internal relationships capture how these concepts are related to each other.

This is how the model can perform acts and infer information way outside of it's training data. Because if the details map to concepts then the conceptual relations can be used to infer desirable output.

(The conceptual distillation also appears to include meta-cognitive behavior, as evidenced by Anthropic's research. Which manes sense to me, what is the most efficient way to be able to replicate irony and humor for an arbitrary subject? Compressing some spectrum of meta-cognitive behavior...)

	▲	kylecazar an hour ago \| parent [-]
		Aren't the conceptual relations you describe still, at their core, just search (even if that's extremely reductive)? We know models can interpolate well, but it's still the same probabilistic pattern matching. They identify conceptual relationships based on associations seen in vast training data. It's my understanding that models are still not at all good at extrapolation, handling data "way outside" of their training set. Also, I was under the impression LLM's can replicate irony and humor simply because that text has specific stylistic properties, and they've been trained on it.

▲

andrei_says_ 4 hours ago | parent | prev | next [-]

“Novel” to the person who has not consumed the training data. Otherwise, just training data combined in highly probable ways.

Not quite autocomplete but not intelligence either.

▲

pc86 4 hours ago | parent | next [-]

What is the difference between "novel" and "novel to someone who hasn't consumed the entire corpus of training data, which is several orders of magnitude greater than any human being could consume?"

	▲	adrian_b 3 hours ago \| parent \| next [-]
		The difference is that when you do not know how a problem can be solved, but you know that this kind of problem has been solved countless times earlier by various programmers, you know that it is likely that if you ask an AI coding assistant to provide a solution, you will get an acceptable solution. On the other hand, if the problem you have to solve has never been solved before at a quality satisfactory for your purpose, then it is futile to ask an AI coding assistant to provide a solution, because it is pretty certain that the proposed solution will be unacceptable (unless the AI succeeds to duplicate the performance of a monkey that would type a Shakespearean text by typing randomly).
	▲	szundi 3 hours ago \| parent \| prev [-]
		[dead]

▲

soulofmischief 4 hours ago | parent | prev [-]

Citation needed that grokked capabilities in a sufficiently advanced model cannot combinatorially lead to contextually novel output distributions, especially with a skilled guiding hand.

▲

arcanemachiner 4 hours ago | parent [-]

Pretty sure burden of proof is on you, here.

▲

soulofmischief 4 hours ago | parent [-]

It's not, because I haven't ruled out the possibility. I could share anecdata about how my discussions with LLMs have led to novel insights, but it's not necessary. I'm keeping my mind open, but you're asserting an unproven claim that is currently not community consensus. Therefore, the burden of proof is on you.

▲

adrian_b 3 hours ago | parent [-]

I agree that after discussions with a LLM you may be led to novel insights.

However, such novel insights are not novel due to the LLM, but due to you.

The "novel" insights are either novel only to you, because they belong to something that you have not studied before, or they are novel ideas that were generated by yourself as a consequence of your attempts to explain what you want to the LLM.

It is very frequent for someone to be led to novel insights about something that he/she believed to already understand well, only after trying to explain it to another ignorant human, when one may discover that the previous supposed understanding was actually incorrect or incomplete.

▲

soulofmischief 2 hours ago | parent [-]

The point is that the combined knowledge/process of the LLM and a user (which could be another LLM!) led to it walking the manifold in a way that produced a novel distribution for a given domain.

I talk with LLMs for hours out of the day, every single day. I'm deeply familiar with their strengths and shortcomings on both a technical and intuitive level. I push them to their limits and have definitely witnessed novel output. The question remains, just how novel can this output be? Synthesis is a valid way to produce novel data.

And beyond that, we are teaching these models general problem-solving skills through RL, and it's not absurd to consider the possibility that a good enough training regimen cannot impart deduction/induction skills into a model that are powerful enough to produce novel information even via means other than direct synthesis of existing information. Especially when given affordances such as the ability to take notes and browse the web.

▲

irishcoffee an hour ago | parent [-]

> I push them to their limits and have definitely witnessed novel output.

I’m quite curious what these novel outputs are. I imagine the entire world would like to know of an LLM producing completely, never-before-created outputs which no human has ever thought before.

Here is where I get completely hung up. Take 2+2. An LLM has never had 2 groups of two items and reached the enlightenment of 2+2=4

It only knows that because it was told that. If enough people start putting 2+2=3 on the internet who knows what the LLM will spit out. There was that example a ways back where an LLM would happily suggest all humans should eat 1 rock a day. Amusingly, even _that_ wasn’t a novel idea for the LLM, it simply regurgitated what it scraped from a website about humans eating rocks. Which leads to the crux: how much patently false information have LLMs scraped that is completely incorrect?

▲

soulofmischief an hour ago | parent [-]

This is not a correct approximation of what happens inside an LLM. They form probabilistic logical circuits which approximate the world they have learned through training. They are not simply recalling stored facts. They are exploiting organically-produced circuitry, walking a manifold, which leads to the ability to predict the next state in a staggering variety of contexts.

As an example: https://arxiv.org/abs/2301.05217

It's not hard to imagine that a sufficiently developed manifold could theoretically allow LLMs to interpolate or even extrapolate information that was missing from the training data, but is logically or experimentally valid.

▲

emp17344 33 minutes ago | parent [-]

You could find a pre-print on Arxiv to validate practically any belief. Why should we care about this particular piece of research? Is this established science, or are you cherry-picking low-quality papers?

	▲	soulofmischief 24 minutes ago \| parent [-]
		I don't need to reach far to find preliminary evidence of circuits forming in machine learning models. Here's some research from OpenAI researchers exploring circuits in vision models: https://distill.pub/2020/circuits/ Are these enough to meet your arbitrary quality bar? Circuits are the basis for features. There is still a ton of open research on this subject. I don't care what you care about, the research is still being done and it's not a new concept.

▲

DebtDeflation 3 hours ago | parent | prev [-]

Information Retrieval followed by Summarization is how I view it.

▲ cultureulterior 3 hours ago | parent | prev | next [-]

This is not true.

▲ johnisgood 5 hours ago | parent | prev [-]

Calling it "just search" is like calling a compiler "just string manipulation". Not false, but aggressively missing the point.

▲

emp17344 5 hours ago | parent | next [-]

No, “just search” is correct. Boosters desperately want it to be something more, but it really is just a tool.

▲

johnisgood 5 hours ago | parent [-]

Yes, it is a tool. No, it is not "just search".

Is your CPU running arbitrary code "just search over transistor states"?

Calling LLMs "just search" is the kind of reductive take that sounds clever while explaining nothing. By that logic, your brain is "just electrochemical gradients".

▲

RhythmFox 5 hours ago | parent | next [-]

I mean, actually not a bad metaphor, but it does depend on the software you are running as to how much of a 'search' you could say the CPU is doing among its transistor states. If you are running an LLM then the metaphor seems very apt indeed.

▲

jvanderbot 4 hours ago | parent | prev [-]

What would you add?

To me it's "search" like a missile does "flight". It's got a target and a closed loop guidance, and is mostly fire and forget (for search). At that, it excels.

I think the closed loop+great summary is the key to all the magic.

▲

soulofmischief 4 hours ago | parent | next [-]

It's a prediction algorithm that walks a high-dimensional manifold, in that sense all application of knowledge it just "search", so yes, you're fundamentally correct but still fundamentally wrong since you think this foundational truth is the end and beginning of what LLMs do, and thus your mental model does not adequately describe what these tools are capable of.

▲

jvanderbot 4 hours ago | parent [-]

Me? My mental model? I gave an analogy for Claude not a explanation for LLMs.

But you know what? I was mentally thinking of both deep think / research and Claude code, both of which are literally closed loop. I see this is slightly off topic b/c others are talking about the LLM only.

	▲	soulofmischief 4 hours ago \| parent [-]
		Sorry, I should have said "analogy" and not "mental model", that was presumptuous. Maybe I also should have replied to the GP comment instead. Anyway, since we're here, I personally think giving LLMs agency helps unlock this latent knowledge, as it provides the agent more mobility when walking the manifold. It has a better chance at avoiding or leaving local minima/maxima, among other things. So I don't know if agentic loops are entirely off-topic when discussing the latent power of LLMs.

▲

bitwize 4 hours ago | parent | prev [-]

Which is kind of funny because my standard quip is that AI research, beginning in the 1950s/1960s, and indeed much of late 20th century computer tech especially along the Boston/SV axis, was funded by the government so that "the missile could know where it is". The DoD wanted smarter ICBMs that could autonomously identify and steer toward enemy targets, and smarter defense networks that could discern a genuine missile strike from, say, 99 red balloons going by.

▲

4 hours ago | parent | prev | next [-]

[deleted]

▲

maxilevi 4 hours ago | parent | prev [-]

I don't mean search in the reductionist way but rather that its much better at translating, finding and mapping concepts if everything is provided vs creating from scratch. If it could truly think it would be able to bootstrap creations from basic principles like we do, but it really can't. Doesn't mean its not a great powerful tool.

▲

ordinaryatom 4 hours ago | parent [-]

> If it could truly think it would be able to bootstrap creations from basic principles like we do, but it really can't.

alphazero?

▲

maxilevi 3 hours ago | parent [-]

I just said LLMs

▲

ordinaryatom 2 hours ago | parent [-]

You are right that LLM and alphazero are different models, but given that alphazero demonstrated having the ability to bootstrap creations, we can't easily rule out LLM also has this ability?

▲

emp17344 2 hours ago | parent [-]

This doesn’t make sense. They are fundamentally different things, so an observation made about Alphazero does not help you learn anything about LLMs.

	▲	ordinaryatom 2 hours ago \| parent [-]
		I am not sure, self-play with LLMs self generated synthetic data is becoming a trendy topic in LLMs research.