You may well be right about neural networks. Sometimes models that seem nonlinear turns linear if those nonlinearities are pushed into the basis functions, so one can still hope.

For GPT like models, I see sentences as trajectories in the embedded space. These trajectories look quite complicated and no obvious from their geometrical stand point. My hope is that if we get the coordinate system right, we may see something more intelligible going on.

This is just a hope, a mental bias. I do not have any solid argument for why it should be as I describe.

▲

nihzm a day ago | parent | next [-]

> Sometimes models that seem nonlinear turns linear if those nonlinearities are pushed into the basis functions, so one can still hope.

That idea was pushed to its limit by the Koopman operator theory. The argument sounds quite good at first, but unfortunately it can’t really work for all cases in its current formulation [1].

[1]: https://arxiv.org/abs/2407.08177

	▲	srean a day ago \| parent [-]
		Quite so. Quite so indeed. We know that under benign conditions and infinite dimensional basis must exist but finding it from finite samples is very non-trivial, we don't know how to do it in the general case.

▲

madhadron a day ago | parent | prev [-]

I’m not sure what you mean by a change of basis making a nonlinear system linear. A linear system is one where solutions add as elements of a vector space. That’s true no matter what basis you express it in.

	▲	srean 13 hours ago \| parent [-]
		It depends on parameterization. For example, if you prameterize the x,y coordinates of a plane-circular trajectory in terms the angle theta, it's nonlinear function of theta. However, if you parameterized a point in terms of the tuple (cos \theta, sin \theta) it comes out as a scaled sum. Here we have pushed the nonlinear functions cos and sin inside the basis functions. A conic section is nonlinear curve (not a line) when considered in the variables of and y. However, in the basis of x^2, xy, y^2, x, y it's linear (well, technically affine). Consider the Naive Bayes classifier. It looks nonlinear till one parameterized it in log p, then it's linear in log-p and log-odds. If one is ok with dimensional basis this linearisation idea can be pushed much further. Take a look at this if you are interested https://math.stackexchange.com/questions/4471490/a-proper-ap...