| ▲ | aabhay 2 days ago |
| My intuition of this problem is much simpler — assuming there’s some rough hierarchy of concepts, you can guesstimate how many concepts can exist in a 12,000-d space by taking the combinatorial of the number of dimensions. In that world, each concept is mutually orthgonal with every other concept in at least some dimension. While that doesn’t mean their cosine distance is large, it does mean you’re guaranteed a function that can linearly separate the two concepts. It means you get 12,000! (Factorial) concepts in the limit case, more than enough room to fit a taxonomy |
|
| ▲ | OgsyedIE 2 days ago | parent | next [-] |
| You can only get 12,000! concepts if you pair each concept with an ordering of the dimensions, which models do not do. A vector in a model that has [weight_1, weight_2, ... weight_12000] is identical to the vector [weight_2, weight_1, ..., weight_12000] within the larger model. Instead, a naive mental model of a language model is to have a positive, negative or zero trit in each axis: 3^12,000 concepts, which is a much lower number than 12000!. Then in practice, almost every vector in the model has all but a few dozen identified axes zeroed because of the limitations of training time. |
| |
| ▲ | aabhay 2 days ago | parent [-] | | You’re right. I gave the wrong number. My model implies 2^12000 concepts, because you choose whether or not to include each concept to form your dimensional subspace. I’m not even referring to the values within that subspace yet, and so once you pick a concept you still get the N degrees of freedom to create a complex manifold. The main value of the mental model is to build an intuition for how “sparse” high dimensional vectors are without resorting to a 3D sphere. |
|
|
| ▲ | Morizero 2 days ago | parent | prev | next [-] |
| That number is far, far, far greater than the number of atoms in the universe (~10^43741 >>>>>>>> ~10^80). |
| |
| ▲ | bmacho 2 days ago | parent | next [-] | | Say, there are 10^80 atoms, then there are like 2^(10^80) possible things, and 2^(2^(10^80)) grouping/categorization/ordering on the things, and so on, you can go higher, and the number of possibilities go up really fast. | |
| ▲ | cleansy 2 days ago | parent | prev | next [-] | | Not surprising since concepts are virtual. There is a person, a person with a partner is a couple. A couple with a kid is a family. That’s 5 concepts alone. | | |
| ▲ | Sharlin a day ago | parent [-] | | I’m not sure you grok how big a number 10^43741 is. If we assume that a "concept" is something that can be uniquely encoded as a finite string of English text, you could go up to concepts that are so complex that every single one would take all the matter in the universe to encode (so say 10^80 universes, each with 10^80 particles), and out of 10^43741 concepts you’d still have 10^43741 left undefined. | | |
| ▲ | jerf a day ago | parent [-] | | A concept space of 10^43741 needs about 43741*3 bits to identify each concept uniquely (by the information theoretic concept of bit, which is more a lower bound on what we traditionally think of as bits in the computer world than a match), or about 16000-ish "bytes", which you can approximate reasonably as a "compressed text size". There's a couple orders of magnitude of fiddling around the edges you can do there but you still end up with human-sized quantities of information to identify specific concepts in a space that size rather than massively-larger-than-the-universe sized. Things like novels come from that space. We sample it all the time. Extremely, extremely sparsely, of course. Or to put it another way, in a space of a given size, identifying a specific component takes the log2 of the space's size in bits to identify a concept, not something the size of the space itself. 10^43741 is a very large space by our standards, but the log2 of it is not impossibly large. If it seems weird for models to work in this space, remember that as the models themselves in their full glory are clocking in at multiple hundreds of gigabytes that the space of possible AIs using this neural architecture is itself 2^trillion-ish, which makes 10^43741 look pedestrian. Understanding how to do anything useful with that amount of possibility is quite the challenge. |
|
| |
| ▲ | am17an 2 days ago | parent | prev [-] | | Somehow that's still an understatement |
|
|
| ▲ | bjornsing 2 days ago | parent | prev [-] |
| > While that doesn’t mean their cosine distance is large There’s a lot of devil in this detail. |