the sensors are 2D

soulofmischief 6 days ago | parent | next [-]

Two of them, giving us stereo vision. We are provided visual cues that encode depth. The ideal world model would at least have this. A world model for a video game on a monitor might be able to get away with no depth information, but a) normal engines do have this information and it would make sense to provide as much data to a general model as possible, and b) the models wouldn't work on AR/VR. Training on stereo captures seems like a win all around.

▲

WithinReason 6 days ago | parent [-]

> We are provided visual cues that encode depth. The ideal world model would at least have this.

None of these world models have explicit concepts of depth or 3D structure, and adding it would go against the principle of the Bitter Lesson. Even with 2 stereo captures there is no explicit 3D structure.

▲

soulofmischief 6 days ago | parent [-]

Increasing the fidelity and richness of training data does not go against the bitter lesson.

The model can learn 3D representation on its own from stereo captures, but there is still richer, more connected data to learn from with stereo captures vs monocular captures. This is unarguable.

You're needlessly making things harder by forcing the model to also learn to estimate depth from monocular images, and robbing it of a channel for error-correction in the case of faulty real-world data.

▲

WithinReason 6 days ago | parent [-]

Stereo images have no explicit 3D information and are just 2D sensor data. But even if you wanted to use stereo data, you would restrict yourself to stereo datasets and wouldn't be able to use 99.9% of video data out there to train on which wasn't captured in stereo, that's the part that's against the Bitter Lesson.

	▲	soulofmischief 6 days ago \| parent \| next [-]
		You don't have to restrict yourself to that, you can create synthetic data or just train on both kinds of data. I still don't understand what the bitter lesson has to do with this. First of all, it's only a piece of writing, not dogma, and second of all it concerns itself with algorithms and model structure itself, increasing the amount of data available to train on does not conflict with it.
	▲	6 days ago \| parent \| prev [-]
		[deleted]

▲

reactordev 6 days ago | parent | prev | next [-]

Incorrect. My sense of touch can be activated in 3 dimensions by placing my hand near a heat source. Which radiates in 3 dimensions.

▲

Nevermark 6 days ago | parent [-]

You are still sensing heat across 2 dimensions of skin.

The 3rd dimension gets inferred from that data.

(Unless you have a supernatural sensory aura!)

▲

AIPedant 6 days ago | parent [-]

The point is that knowing where your hand is in space relative to the rest of your body is a distinct sense which is directly three-dimensional. This information is not inferred, it is measured with receptors in your joints and ligaments.

▲

Nevermark 6 days ago | parent [-]

No it is inferred.

You are inferring 3D positions based on many sensory signals combined.

From mechanoreceptors and proprioceptors located in our skin, joints, and muscles.

We don’t have 3-element position sensors, nor do we have 3-d sensor volumes, in terms of how information is transferred to the brain. Which is primarily in 1D (audio) or 2D (sensory surface) layouts.

From that we learn a sense of how our body is arranged very early in life.

EDIT: I was wrong about one thing. Muscle nerve endings are distributed throughout the muscle volume. So 3D positioning is not sensed, but we do have sensor locations distributed in rough and malleable 3D topologies.

Those don’t give us any direct 3D positioning. In fact, we are notoriously bad at knowing which individual muscles we are using. Much less what feeling correspond to what 3D coordinate within each specific muscle, generally. But we do learn to identify anatomical locations and then infer positioning from all that information.

▲

reactordev 6 days ago | parent [-]

Your analysis is incorrect again. Having sensors spread out across a volume is, by definition, measuring 3D space. It’s a volume. Not a surface. Humans are actually really good at knowing which muscles we are using. It’s called body sculpting. Lifting. Body building. And all of that. So nice try.

▲

Nevermark 5 days ago | parent [-]

Ah good point. 3D in terms of anatomy, yes.

Then the mapping of those sensors to the bodies anatomical state in 3D space is learned.

A surprising number of kinds of dimension involved in categorizing sensors.

	▲	reactordev 5 days ago \| parent [-]
		Agreed :) It doesn’t make it any less 3d though. It’s the additive sensing of all sensors within a region that gives you that perception. Fascinating stuff.

▲

echelon 6 days ago | parent | prev | next [-]

The GPCRs [1] that do most of our sense signalling are each individually complicated machines.

Many of our signals are "on" and are instead suppressed by detection. Ligand binding, suppression, the signalling cascade, all sorts of encoding, ...

In any case, when all of our senses are integrated, we have rich n-dimensional input.

- stereo vision for depth

- monocular vision optics cues (shading, parallax, etc.)

- proprioception

- vestibular sensing

- binaural hearing

- time

I would not say that we sense in three dimensions. It's much more.

[1] https://en.m.wikipedia.org/wiki/G_protein-coupled_receptor

▲

2OEH8eoCRo0 6 days ago | parent | prev | next [-]

And the brain does sensor fusion to build a 3d model that we perceive. We don't perceive in 2d

There are other sensors as well. Is the inner ear a 2d sensor?

	▲	AIPedant 6 days ago \| parent [-]
		Inner ear is a great example! I mentioned in another comment that if you want to be reductive the sensors in the inner ear - the hairs themselves - are one dimensional, but the overall sense is directly three dimensional. (In a way it's six dimensional since it includes direct information about angular momentum, but I don't think it actually has six independent degrees of freedom. E.g. it might be hard to tell the difference between spinning right-side-up and upside-down with only the inner ear, you'll need additional sense information.)

▲

AIPedant 6 days ago | parent | prev [-]

It is simply wrong to describe touch and proprioception receptors as 2D.

a) In a technical sense the actual receptors are 1D, not 2D. Perhaps some of them are two dimensional, but generally mechanical touch is about pressure or tension in a single direction or axis.

b) The rods and cones in your eyes are also 1D receptors but they combine to give a direct 2D image, and then higher-level processing infers depth. But touch and proprioception combine to give a direct 3D image.

Maybe you mean that the surface of the skin is two dimensional and so is touch? But the brain does not separate touch on the hand from its knowledge of where the hand is in space. Intentionally confusing this system is the basis of the "rubber hand illusion" https://en.wikipedia.org/wiki/Body_transfer_illusion

▲

Nevermark 6 days ago | parent [-]

I think you mean 0D for individual receptors.

Point (I.e. single point/element) receptors, that encode a single magnitude of perception, each.

The cochlea could be thought of 1D. Magnitude (audio volume) measured across 1D = N frequencies. So a 1D vector.

Vision and (locally) touch/pressure/heat maps would be 2D, together.

▲

AIPedant 6 days ago | parent [-]

No, the sensors measure a continuum of force or displacement along a line or rotational axis, 1D is correct.

	▲	Nevermark 6 days ago \| parent [-]
		That would be a different use of dimension. The measurement of any one of those is a 0 dimensional tensor, a single number. But then you are right, what. is being measured by that one sensor is 1 dimensional. But all single sensors measure across a 1 dimensional variable. Whether it’s linear pressure, rotation, light intensity, audio volume at 1 frequency, etc.