▲ | fxj 5 hours ago | |
The question is: Is it like jpeg compression where the errors do not accumulate but the image comverges to a self inverse compressed image or does the data set converge to a single point which is meaningless? | ||
▲ | rapatel0 4 hours ago | parent [-] | |
The transformation function in jpeg (DCT) is generally well defined math. While lossy, most of the information is reprocudable. An LLM is layers and layers of non-linear transformations. It's hard to say exactly how information is accumulated. You can inspect activations from tokens but it's really not clear how to define what the function is exactly doing. Therefore error is poorly understood. |