| ▲ | dataviz1000 4 hours ago | |
Here is a 3blue1brown short about the relationship between words in a 3 dimensional vector space. [0] In order to show this conceptually to a human it requires reducing the dimensions from 10,000 or 20,000 to 3. In order to get the thinking to be human understandable the researchers will reward not just the correct answer at the end during training but also seed at the beginning with structured thinking token chains and reward the format of the thinking output. The thinking tokens do just a handful of things: verification, backtracking, scratchpad or state management (like you doing multiplication on a paper instead of in your mind), decomposition (break into smaller parts which is most of what I see thinking output do), and criticize itself. An example would be a math problem that was solved by an Italian and another by a German which might cause those geographic areas to be associated with the solution in the 20,000 dimensions. So if it gets more accurate answers in training by mentioning them it will be in the gibberish unless they have been trained to have much more sensical (like the 3 dimensions) human readable output instead. It has been observed, sometimes, a model will write perfectly normal looking English sentences that secretly contain hidden codes for itself in the way the words are spaced or chosen. | ||
| ▲ | 2 hours ago | parent [-] | |
| [deleted] | ||