| ▲ | kybernetikos 14 hours ago |
| Neural networks are universal approximators. The function being approximated in an LLM is the mental process required to write like a human. Thinking of it as an averaging devoid of meaning is not really correct. |
|
| ▲ | Terr_ 13 hours ago | parent | next [-] |
| > The function being approximated in an LLM is the mental process required to write like a human. Quibble: That can be read as "it's approximating the process humans use to make data", which I think is a bit reaching compared to "it's approximating the data humans emit... using its own process which might turn out to be extremely alien." |
| |
| ▲ | TeMPOraL 13 hours ago | parent [-] | | Good point. Then again, whatever process we're using, evolution found it in the solution space, using even more constrained search than we did, in that every intermediary step had to be non-negative on the margin in terms of organism survival. Yet find it did, so one has to wonder: if it was so easy for a blind, greedy optimizer to random-walk into human intelligence, perhaps there are attractors in this solution space. If that's the case, then LLMs may be approximating more than merely outcomes - perhaps the process, too. | | |
| ▲ | jayd16 13 hours ago | parent | next [-] | | Its fuzzier than that. Something can be detrimental and survive as long as its not too detrimental. Plus there is the evolving meta that moves the goal posts constantly. Then there's the billions of years of compute... | |
| ▲ | wavemode 12 hours ago | parent | prev | next [-] | | An easy counterargument is that - there are millions of species and an uncountable number of organisms on Earth, yet humans are the only known intelligent ones. (In fact high intelligence is the only trait humans have that no other organism has.) That could perhaps indicate that intelligence is a bit harder to "find" than you're claiming. | |
| ▲ | adrianN 11 hours ago | parent | prev | next [-] | | Negative mutations can survive for a long time if they're not too bad. For example the loss of vitamin C synthesis is clearly bad in situations where you have to survive without fresh food for a while, but that comes up so rarely that there was little selection pressure against it. | |
| ▲ | thrownthatway 12 hours ago | parent | prev [-] | | > if it was so easy That’s one giant leap you got there. That the probably that intelligent life exists in the universe is 1, says nothing about that ease, or otherwise, with which it came about. By all scientific estimates, it took a very long time and faced a very many hurdles, and by all observational measures exists no where else. Or, what did you mean by easy? |
|
|
|
| ▲ | Borealid 14 hours ago | parent | prev | next [-] |
| I don't think of it as "devoid of meaning". It's just curious to me that minimizing a loss function somehow results in sentences that look right but still... aren't. Like the one I quoted. |
| |
| ▲ | kybernetikos 13 hours ago | parent [-] | | A human in school might try to minimise the difference between their grades and the best possible grades. If they're a poor student they might start using more advanced vocabulary, sometimes with an inadequate grasp of when it is appropriate. Because the training process of LLMs is so thoroughly mathematicalised, it feels very different from the world of humans, but in many ways it's just a model of the same kinds of things we're used to. |
|
|
| ▲ | fyredge 13 hours ago | parent | prev | next [-] |
| > Thinking of it as an averaging devoid of meaning is not really correct. To me, this sentence contradicts the sentence before it. What would you say neural networks are then? Conscious? |
| |
| ▲ | kybernetikos 13 hours ago | parent [-] | | They are a mathematical function that has been found during a search that was designed to find functions that produce the same output as conscious beings writing meaningful works. | | |
| ▲ | fyredge 13 hours ago | parent [-] | | Agreed, and to that point, the way to produce such outputs is to absorb a large corpus of words and find the most likely prediction that mimics the written language. By virtue of the sheer amount of text it learns from, would you say that the output tends to find the average response based on the text provided? After all, "over fitting" is a well known concept that is avoided as a principle by ML researchers. What else could be the case? | | |
| ▲ | kybernetikos 7 hours ago | parent [-] | | I think 'average' is creating a bad intuition here. In order to accurately predict the next word in a human generated text, you need a model of the big picture of what is being said. You need a model of what is real and what is not real. You need a model of what it's like to be a human. The number of possible texts is enormous which means that it's not like you can say "There are lots of texts that start with the same 50 tokens, I'll average the 51st token that appears in them to work out what I should generate". The subspace of human generated texts in the space of all possible texts is extremely sparse, and 'averaging' isn't the best way to think of the process. |
|
|
|
|
| ▲ | 13 hours ago | parent | prev [-] |
| [deleted] |