There is no such thing as "thing" here.

These models are trained such that the given conditions (the visual input and the text prompt) will be continued with a desirable continuation (motor function over time).

The only dimension accuracy can apply to is desirability.

▲

jayd16 a year ago | parent [-]

You don't think there's any segmentation going on?

	▲	thomastjeffery a year ago \| parent [-]
		Implicitly, maybe. Does that matter if you don't know where?