| ▲ | srean 15 hours ago | ||||||||||||||||||||||||||||||||||||||||||||||||||||
People go all dopey eyed about "frequency space", that's a red herring. The take away should be that a problem centric coordinate system is enormously helpful. After all, what Copernicus showed is that the mind bogglingly complicated motion of planets become a whole lot simpler if you change the coordinate system. Ptolemaic model of epicycles were an adhoc form of Fourier analysis - decomposing periodic motions over circles over circles. Back to frequencies, there is nothing obviously frequency like in real space Laplace transforms *. The real insight is that differentiation and integration operations become simple if the coordinates used are exponential functions because exponential functions remain (scaled) exponential when passed through such operations. For digital signals what helps is Walsh-Hadamard basis. They are not like frequencies. They are not at all like the square wave analogue of sinusoidal waves. People call them sequency space as a well justified pun. My suspicion is that we are in Ptolemaic state as far as GPT like models are concerned. We will eventually understand them better once we figure out what's the better coordinate system to think about their dynamics in. * There is a connection though, through the exponential form of complex numbers, or more prosaically, when multiplying rotation matrices the angles combine additively. So angles and logarithms have a certain unity, or character. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | madhadron 13 hours ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
All these transforms are switching to an eigenbasis of some differential operator (that usually corresponds to a differential equation of interest). Spherical harmonics, Bessel and Henkel functions, which are the radial versions of sines/cosines and complex exponential, respectively, and on and on. The next big jumps were to collections of functions not parameterized by subsets of R^n. Wavelets use a tree shapes parameter space. There’s a whole, interesting area of overcomplete basis sets that I have been meaning to look into where you give up your basis functions being orthogonal and all those nice properties in exchange for having multiple options for adapting better to different signal characteristics. I don’t think these transforms are going to be relevant to understanding neural nets, though. They are, by their nature, doing something with nonlinear structures in high dimensions which are not smoothly extended across their domain, which is the opposite problem all our current approaches to functional analysis deal with. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | anamax 3 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
> My suspicion is that we are in Ptolemaic state as far as GPT like models are concerned. We will eventually understand them better once we figure out what's the better coordinate system to think about their dynamics in. Most deep learning systems are learned matrices that are multiplied by "problem-instance" data matrices to produce a prediction matrix. The time to do said matrix-multiplication is data-independent (assuming that the time to do multiply-adds is data-independent). If you multiply both sides by the inverse of the learned matrix, you get an equation where finding the prediction matrix is a solving problem, where the time to solve is data dependent. Interestingly enough, that time is sort-of proportional to the difficulty of the problem for said data. Perhaps more interesting is that the inverse matrix seems to have row artifacts that look like things in the training data. These observations are due to Tsvi Achler. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | RossBencina 3 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
> exponential functions remain (scaled) exponential when passed through such operations. See also: eigenvalue, differential operator, diagonalisation, modal analysis | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | alexlesuper 15 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
I feel like this is the way we should have learned Fourier and Laplace transforms in my DSP class. Not just blindly applying formulas and equations. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | Xcelerate 13 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
It’s kind of intriguing that predicting the future state of any quantum system becomes almost trivial—assuming you can diagonalize the Hamiltonian. But good luck with that in general. (In other words, a “simple” reference frame always exists via unitary conjugation, but finding it is very difficult.) | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||