Remix.run Logo
aatd86 15 hours ago

My own little insights and ramblings as an uninitiated quack (just spent the night asking Claude to explain machine learning to me):

seems that we are learning in layers, one of the first layers being 2D neural net (images) augmented by other sensory data to create a 3D if not 4D model (neural net). HRTFs for sound increases the spatial data we get from images. With depth coming from sound and light and learnt movements(touch) we seem to develop a notion of space and time. (multimodality?)

Seems that we can take low dimensional inputs and correlate them to form higher dimensional structures.

Of course, physically it comes from noticing the dampening of visual data (in focus for example) and memorized audio data (sound frequency and amplitude, early reflections, doppler effect etc). That should be emergent from training.

Those data sources can be inperfectly correlated. That's why we count during a lightning storm to evaluate distance. It's low dimensional.

In a sense, it's a measure of required effort perhaps (distance to somewhere).

What's funny is that it seems to go the other way from traditional training where we move from higher dimensional tensor spaces to lower ones. At least in a first step.

Flamentono2 10 hours ago | parent [-]

Its hard to follow what you try to commounicate at least the last half.

Nonetheless, yes we do know certain brain structures like your image net analogy but the way you describe it, sounds a little bit of.

Our virtual cortex is not 'just a layer' its a component i would say and its optimized of detecting things.

Other components act differently with different structures.

epr 8 hours ago | parent [-]

A bit confusing for sure, but I think (not sure) I get what they're saying. Training a nn (for visual tasks at least) consists of training a model with much more dimensions (params) than the input space (eg: controller inputs + atari pixels). This contrasts with a lot of what humans do, which is take higher dimensional information (tons of data per second combining visual, audio, touch/vibration, etc) and synthesizing much lower dimensional models / heuristics / rules of thumb, like the example they give of the 5 second per mile rule for thunder.