| ▲ | deadbabe 3 days ago |
| Form ideas without the use of language. For example: imagining how you would organize a cluttered room. |
|
| ▲ | Chabsff 3 days ago | parent | next [-] |
| Ok, but how do you go about measuring whether a black-box is doing that or not? We don't apply that criteria when evaluating animal intelligence. We sort of take it for granted that humans at large do that, but not via any test that would satisfy an alien. Why should we be imposing white-box constraints to machine intelligence when we can't do so for any other? |
| |
| ▲ | deadbabe 3 days ago | parent [-] | | There is truly no such thing as a “black box” when it comes to software, there is only a limit to how much patience a human will have in understanding the entire system in all its massive complexity. It’s not like an organic brain. | | |
| ▲ | Chabsff 3 days ago | parent | next [-] | | The black box I'm referring to is us. You can't have it both ways. If your test for whether something is intelligent/thinking or not isn't applicable to any known form of intelligence, then what you are testing for is not intelligence/thinking. | |
| ▲ | holmesworcester 3 days ago | parent | prev | next [-] | | You wouldn't say this about a message encrypted with AES though, since there's not just a "human patience" limit but also a (we are pretty sure) unbearable computational cost. We don't know, but it's completely plausible that we might find that the cost of analyzing LLMs in their current form, to the point of removing all doubt about how/what they are thinking, is also unbearably high. We also might find that it's possible for us (or for an LLM training process itself) to encrypt LLM weights in such a way that the only way to know anything about what it knows is to ask it. | |
| ▲ | mstipetic 3 days ago | parent | prev [-] | | Just because it runs on a computer doesn’t mean it’s “software” in the common meaning of the word |
|
|
|
| ▲ | embedding-shape 3 days ago | parent | prev | next [-] |
| > Form ideas without the use of language. Don't LLMs already do that? "Language" is just something we've added as a later step in order to understand what they're "saying" and "communicate" with them, otherwise they're just dealing with floats with different values, in different layers, essentially (and grossly over-simplified of course). |
| |
| ▲ | heyjamesknight 3 days ago | parent | next [-] | | But language is the input and the vector space within which their knowledge is encoded and stored. The don't have a concept of a duck beyond what others have described the duck as. Humans got by for millions of years with our current biological hardware before we developed language. Your brain stores a model of your experience, not just the words other experiencers have shared with yiu. | | |
| ▲ | embedding-shape 3 days ago | parent [-] | | > But language is the input and the vector space within which their knowledge is encoded and stored. The don't have a concept of a duck beyond what others have described the duck as. I guess if we limit ourselves to "one-modal LLMs" yes, but nowadays we have multimodal ones, who could think of a duck in the way of language, visuals or even audio. | | |
| ▲ | deadbabe 3 days ago | parent [-] | | You don’t understand. If humans had no words to describe a duck, they would still know what a duck is. Without words, LLMs would have no way to map an encounter with a duck to anything useful. | | |
| ▲ | embedding-shape 2 days ago | parent [-] | | Which makes sense for text LLMs yes, but what about LLMs that deal with images? How can you tell they wouldn't work without words? It just happens to be words we use for interfacing with them, because it's easy for us to understand, but internally they might be conceptualizing things in a multitude of ways. | | |
| ▲ | heyjamesknight 2 days ago | parent [-] | | Multimodal models aren't really multimodal. The images are mapped to words and then the words are expanded upon by a single mode LLM. If you didn't know the word "duck", you could still see the duck, hunt the duck, use the ducks feather's for your bedding and eat the duck's meat. You would know it could fly and swim without having to know what either of those actions were called. The LLM "sees" a thing, identifies it as a "duck", and then depends on a single modal LLM to tell it anything about ducks. | | |
| ▲ | embedding-shape 2 days ago | parent [-] | | > Multimodal models aren't really multimodal. The images are mapped to words and then the words are expanded upon by a single mode LLM. I don't think you can generalize like that, it's a big category, not all multimodal models work the same, it's just a label for a model that has multiple modalities after all, not a specific architecture of machine learning models. |
|
|
|
|
| |
| ▲ | deadbabe 3 days ago | parent | prev [-] | | LLMs don’t form ideas at all. They search vector space and produce output, sometimes it can resemble ideas if you loop into itself. | | |
|
|
| ▲ | tim333 3 days ago | parent | prev [-] |
| Genie 3 is along the lines of ideas without language. It doesn't declutter though, I think. https://youtu.be/PDKhUknuQDg |