Remix.run Logo
dvfjsdhgfv a day ago

One thing I'm 100% is that a cut off date doesn't exist for any large model, or rather there is no single date since it's practically almost impossible to achieve that.

sib 8 hours ago | parent | next [-]

But I think the general meaning of a cutoff date, D, is:

The model includes nothing AFTER date D

and not

The model includes everything ON OR BEFORE date D

Right? Definitionally, the model can't include anything that happened after training stopped.

dvfjsdhgfv 3 hours ago | parent [-]

That's correct. However, it is almost meaningless in practice as it might as well mean that, say, 99,99% of the content is 2 years old and older, and only 0,01 was trained just before that date. So if you need functionality that's dependent on new information, you have to test it for each particular component you need.

Unfortunately I work with new APIs all the time and the cutoff date is of no much use.

koolba a day ago | parent | prev | next [-]

Indeed. It’s not possible stop the world and snapshot the entire internet in a single day.

Or is it?

gf000 8 hours ago | parent | next [-]

You can trivially maximal bound it, though. If the training finished today, then today is a cutoff date.

dragonwriter 8 hours ago | parent | prev | next [-]

That's... not what a cutoff date means. Cutoff date is an upper bound, not a promise that the model is trained on every piece of information set in a fixed form before that date.

tough a day ago | parent | prev [-]

you would have an append only incremental backup snapshot of the world

tonyhart7 a day ago | parent | prev [-]

its not a definitive "date" you cut off information, but more a "recent" material you can feed, training takes times

if you waiting for a new information, of course you are not going ever to train