"The first time you've seen these objects" is a weird thing to say. One presumes that this is already in their training set, and that these models aren't storing a huge amount of data in their context, so what does that even mean?

▲

jayd16 8 months ago | parent | next [-]

It probably gives them confidence that they can accurately see a thing even though they don't know what that thing is.

I could also imagine a lot of safety around leaving things outside of the current task alone so you might have to bend over backwards to get new objects worked on.

▲

thomastjeffery 8 months ago | parent [-]

There is no such thing as "thing" here.

These models are trained such that the given conditions (the visual input and the text prompt) will be continued with a desirable continuation (motor function over time).

The only dimension accuracy can apply to is desirability.

▲

jayd16 8 months ago | parent [-]

You don't think there's any segmentation going on?

	▲	thomastjeffery 8 months ago \| parent [-]
		Implicitly, maybe. Does that matter if you don't know where?

▲

ygouzerh 8 months ago | parent | prev | next [-]

So from what I understand it actually means that they were for example never trained on a video of an apple. Maybe only on a video of bread, pineapple, chocolate.

However, as it was trained using generic text data similarly to a normal LLM, it knows how an apple is supposed to look like.

Similar than a kid that never saw a banana, but his parent described it to him.

▲

Symmetry 8 months ago | parent | prev [-]

It's normal to have a training set and a validation set and I interpreted that to mean that these items weren't in the training set.